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(57) Abstract 

In accordance with the present invention, there are provided novel Down Syndrome-Cell Adhesion Molecule (DS-CAM) proteins. 
Nucleic acid sequences encoding such proteins and assays employing same are also disclosed. The invention DS-CAM proteins can be 
employed in a variety of ways, for example, for the production of anti-DS-CAM antibodies thereto, in therapeutic compositions and 
methods employing such proteins and/or antibodies. DS-CAM proteins are also useful in bioassays to identify agonists and antagonists 
thereto. 
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P riT) EHCQ1M PS-cam PROTEINS 
A^n PROTECTS RELATED THERETO 

This is a non-provisional application based on, 
and claims the benefit of, U.S. Provisional Application 
5 No. 60/029,322 filed October 25, 1996, the content of 
which is incorporated herein by reference in its 
entirety. 

A rTqttnwT,F.nflMENT 

This invention was made with Government support 
10 under Grant Numbers HL50025 and HD17449 awarded by the 
National Institutes of Health and DE-FG03 -92ER61402 
awarded by the Department of Energy. The Government has 
certain rights in this invention. 

FT T7T.n DE THE IN VENTION 

15 T he present invention relates to nucleic acids 

and proteins encoded thereby. Invention nucleic acids 
encode a novel N-CAM member of the immunoglobulin 
superfamily of proteins. The invention also relates to 
methods for making and using such nucleic acids and 

20 proteins. 

RRrgHROTIND T H* INVENTION 

Research spanning the last decade has 
significantly elucidated the molecular events attending 
cell -cell interactions in the body, especially those 
25 events involved in the movement and activation of cells 
in the immune system. See generally, Springer et al . , 
Nature 346:425-434, 1990. Cell surface proteins, and 
especially the so-called Cellular Adhesion Molecules 
("CAMs") have correspondingly been the subject of 
30 pharmaceutical research and development having as its 
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goal intervening in the processes of leukocyte 
extravasation to sites of inflammation and leukocyte 
movement to distinct target tissues. The isolation and 
characterization of cellular adhesion molecules, the 
5 cloning and expression of DNA sequences encoding such 
molecules, and the development of therapeutic and 
diagnostic agents relevant to inflammatory process, viral 
infection and cancer metastasis have also been the 
subject of numerous U.S. and foreign applications for 
10 Letters Patent. See Edwards, Current Opinion in 
TV, 0 ^ r o M rir Patents 1 (11) : 1617-1630 , 1991 and 
particularly the published "patent literature references" 
cited therein. 

Numerous CAMs have been characterized to date. 
15 See, for example, vascular adhesion molecule (VCAM-1) as 
described in PCT WO 90/13300; platelet endothelial cell 
adhesion molecule (PECAM-1) described in Newman et al . , 

9r.ie.nce 247:1219-1222 , 1990; and PCTWO 91/10683; and the 

following U.S. Patents: 5,525,487; 5,235,049; 5,272,263; 
20 5,489,233; 5,264,554; 5,318,890; 5,389,520; 5,519,008; 
and the like. 

There is substantial evidence that N-CAM and 
its relatives play an important part in neural 

25 development (Edelman and Crossin, "CELL ADHESION 

MOLECULES: Implications for a Molecular Histology", &an. 
bpv Riochem . 60:155-190, 1991; and Walsh and Doherty, 
CUTE QBiniflfl in Cell Biol- 5:791-796, 1993). For 
example, antibodies directed against N-CAMs disturbed the 

30 normal growth pattern of nerve processes. N-CAM (locus 
llq23.1) is expressed in large amounts in cells of the 
developing neural tube, but when neural crest cells 
dissociate from the neural tube and migrate away, they 
lose N-CAM, only to reexpress it later when they 

35 reaggregate to form a neural ganglion. In addition, 



W098/17795 PCT/US97/19547 

3 

Rosenthal et al . , (Mature genet- 2:107-112, 1992) 
reported that mutations in CAM-LI (locus Xq28) cause 
X-linked hydrocephalus, and Jouet et al . , ( Nature Genet - 
7:402-407, 1994) showed that mutations in CAM LI gene are 
5 responsible for type 1 X-linked spastic paraplegia and 
MASA syndrome which shows agenesis of the corpus 
callosum. Therefore, there is a need in the art to 
identify and isolate novel N-CAM members of the 
immunoglobulin superfamily so that their role in neural 
10 development and neural cell communication can be 
determined. 

Therefore, there continues to be a need in the 
art for the discovery of additional proteins 
participating in human cell -cell interactions and 

15 especially a need for information serving to specifically 
identify and characterize such proteins in terms of their 
amino acid sequence. Moreover, to the extent that such 
molecules 'might form the basis for the development of 
therapeutic and diagnostic agents, it is essential that 

20 the DNA encoding them be elucidated. The present 
invention satisfies this need and provides related 
advantages as . wel 1 . 



pPTF.F nPfiCRIPTTON OF THE INVENT ION 

In accordance with the present invention, there 
25 are provided isolated nucleic acids encoding novel 

mammalian N-CAM (neural -cell adhesion molecule) members 
of the immunoglobulin superfamily of proteins, referred 
to herein as Down Syndrome -Cell Adhesion Molecules 
(DS-CAMs) . Further provided are vectors containing 
30 invention nucleic acids, probes that hybridize thereto, 
host cells transformed therewith, antisense 
oligonucleotides thereto and related compositions. The 
nucleic acid molecules described herein can be 
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incorporated into a variety of recombinant expression 
systems known to those of skill in the art to readily 
produce isolated DS-CAM proteins. In addition, the 
nucleic acid molecules of the present invention are 
5 useful as probes for assaying for the presence and/or 
amount of a DS-CAM gene or mRNA transcript in a given 
sample. The nucleic acid molecules described herein, and 
oligonucleotide fragments thereof, are also useful as 
primers and/or templates in a PCR reaction for amplifying 
10 genes encoding DS-CAM proteins. 

In accordance with the present invention, there 
are also provided isolated mammalian DS-CAM proteins. 
These proteins are useful, for example, in neural 
prosthetic devices used in entubulation methods of 

15 repairing (regenerating) damaged or severed peripheral 
nerves .(see, e.g., U.S. Patent No. 4,955,892, 
incorporated herein by reference) . In addition, these 
proteins/ or fragments thereof, are-useful- as immunogens - 
for producing anti-DS-CAM antibodies, or in therapeutic 

20 compositions containing such proteins and/or antibodies. 
Invention DS-CAM proteins are also useful in bioassays to 
identify agonists and antagonists thereto. Also provided 
are transgenic non-human mammals that express the 
invention protein. 

25 Antibodies that are immunoreactive with 

invention DS-CAM proteins are also provided. These 
antibodies are useful in diagnostic assays to determine 
levels of DS-CAM proteins present in a given sample, 
e.g., tissue samples, Western blots, and the like. The 

30 antibodies can also be used to purify DS-CAM proteins 
from crude cell extracts and the like. Moreover, these 
antibodies are considered therapeutically useful to 
counteract or supplement the biological effect of DS-CAMs 
ip vj-VO- 
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Methods and diagnostic systems for determining 
the levels of DS-CAM protein in various tissue samples 
are also provided. These diagnostic methods can be used 
for monitoring the level of therapeutically administered 
5 DS-CAM protein or fragments thereof to facilitate the 
maintenance of therapeutically effective amounts. These 
diagnostic methods can also be used to diagnose 
physiological disorders that result from abnormal levels 
or abnormal structures of the DS-CAM protein. 

10 RRTEF DESCRIPTION OF THE FIGURES 

Figure 1 shows a physical map of the 
localization of the DS-CAM gene to a region between 
D21S345 and D21S347 on chromosome 21. The locations of 
BAC clones (starting with numbers) and PAC clones 
15 (starting with "P") are indicated by horizontal bars. An 
• arrow head' indicates a gap- in the BAG and -PAC contig. 
The location of the DS-CAM gene is indicated by a thick 
arrow . 

Figure 2 shows the predicted amino acid 
20 sequence of the human DS-CAM1 protein corresponding to 
SEQ ID NO: 2 and a schematic structure. IG: 
Immunoglobulin type-C2 domain. FbN: Fibronectin type III 
domain. The bold £s in the amino acid sequence indicates 
Cysteine residues forming disulfide bonds in the Ig-like 
25 type-C2 domains. The bold NXS and NXT in the amino acid 
sequence correspond to potential N-glycosylation sites. 

Figure 3 shows a partial genomic structure of 
DS-CAM1 and a deletion contained in DS-CAM2 cDNA clones 
(clones pDS-CAM-18 and pDS-CAM-52) . The deletion 
30 boundary sequence (GC-AG) suggests an unusual 

alternative splicing. The horizontal bar represents 
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genomic sequence containing exons of DS-CAM-42. Exons 
are indicated by open boxes. Exon-intron boundaries are 
defined by a comparison of the cDNA sequence of 
pDS-CAM-42 and genomic sequence determined from a BAC 
5 clone. 

Figure 4 shows a schematic comparison of 
neuronal Ig superfamily members. Ig-like type C-2 
domains, fibronectin type III domains and transmembrane 
domains are indicated. MAG: myelin-associated 
10 glycoprotein, N-CAM : neural cell adhesion molecule, 
BIG-1: brain-derived immunoglobulin (Ig) superfamily 
molecule- 1, DCC-. deleted in colorectal carcinoma. 

rPTV T^p pi^cirpTPTTON OF THE INVENTION 

In accordance with the present invention, there 

15 are provided isolated nucleic acids, which encode novel 

mammalian members of the DS-CAM family of proteins, and 

fragments thereof. The phrase "DS-CAM" refers to 
substantially pure native DS-CAM protein, or 
recombinantly produced proteins, including naturally 

20 occurring allelic variants thereof encoded by mRNA 
generated by alternative splicing of a primary 
transcript, such as DS-CAM1 (SEQ ID NO:2) and DS-CAM2 
(SEQ ID NO: 11) disclosed herein, and further including 
fragments thereof which retain at least one native 

25 biological activity, such as immunogenicity . In one 

aspect, invention DS-CAM proteins, such as DS-CAM1, are 
cell -surface glycoproteins that are mobile in the plane 
of the membrane. Invention DS-CAMl proteins contain 
extra- and intra-cellular domains that transduce 

30 information from the outside of the cell to the cytoplasm 
and the nucleus, thereby determining cell function. In 
another aspect, invention DS-CAM proteins, such as DS- 
CAM2, are non-membrane bound, soluble proteins. 
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In one aspect of the invention DS-CAM proteins 
are further characterized as comprising at least 7 
immunoglobulin -like (Ig-like) domains homologous to the 
immunoglobulin superfamily and 6 type III fibronectin 
5 repeats (see, e.g., Edelman and Crossin, " CELL ADHESION 
MOLECULES : Implications for a Molecular Histology", Amu 
rpv Riochem ., 60:155-190, 1991; and Walsh and Doherty, 
£U££ QBinian in Cell Biol., 5:791-796, 1993; each of 
which is incorporated herein by reference in its 
10 entirety) . In another aspect of the invention, DS-CAM 
proteins are those proteins comprising at least 8, 
preferably at least 9 Ig-like domains, with at least 10 
Ig-like domains being especially preferred. 

As used herein, "Ig-like domains", or 
15 grammatical variations thereof, refers to the well known 
repeats that are common among Cell Adhesion Molecules 
(CAMs) (see, e.g., Figure 1A at p. 158 of Edelman and 
Crossin, supra . 1991; and Walsh and Doherty, supra. 1993; 
each of which is incorporated herein by reference in its 
20 entirety) . 

The phrase "type III fibronectin repeats", 
"fibronectin repeats," or grammatical variations thereof, 
refers to the well known repeats that are common among 
Cell Adhesion Molecules (CAMs) (see, e.g., Figure 1A at 
25 p. 158 of Edelman and Crossin, supra, 1991; and Walsh and 
Doherty, supra, l" 3 ' each of which is incorporated 
herein by reference in its entirety) . 

The invention DS-CAM proteins define a novel 
sub-class of the Ig (immunoglobulin) superfamily with 
30 highest homologies to the neural cell adhesion molecules 
including BIG-1 (Yoshihara et al . , Neuron. 13:415-426, 
1994), CAM-LI (Moos et al . , Nature. 334:701-703, 1988), 
DCC (Fearon et al., Science 247:49-56, 1990), neogenin 
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(Lane et al., Gnomics 35:456-465, 1996), and contactin 
(Ranscht, J - ggll Bio . 107:1561-1573, 1988) (Figure 4). 
It has been found that the structure of invention DS-CAM 
proteins is unique within the neural immunoglobulin 
5 superfamily, and is distinctive due to the number of 
Ig-like type C2 and fibronectin III domains (10 and 6 
respectively) and from the interruption of the fourth and 
fifth fibronectin domains by a 10th C2 domain, the 
functional significance of which may be of interest. The 

10 novel structure of DS-CAM and its expression throughout 
the nervous system during differentiation suggest 
interesting roles for the neural CAM in neural 
development and function. The location of DS-CAM in a 
region critical for DS neurocognit ive phenotypes provides 

15 a human model in which to test the significance of these 
roles for cognitive function. 

The neural Ig-superfamily members play critical 
roles in neural development and function and have been 
implicated in cell migration and sorting, axon guidance 

2 0 and f asciculation, formation of neural connections, and 
in synaptic plasticity (Edelman and Crossin, snsra, 1991; 
Walsh and Doherty, supra . 1993; Tessier-Lavigne et al . , 
.qripnrp 274:1123-3133. 1996; Sfr iptP.r 3l . , Neuron 
17:641-654. Shunter et al , . Neuron 17:655-657, 

25 1996) . These activities are mediated by the homophilic 
or heterophilic binding properties of Ig-superfamily 
members (Mauro et al., >T. Cell Bio- 119:191-202, 1992 and 
Milev et al., J - Biol- Chem . 271:15716-15723, 1996), the 
binding of Ig-superfamily proteins to extracellular 

30 matrix proteins (Grumet et al., CsiU Adhesion Comm. 

1:177-190, 1993; Taira et al . , Mewrpn 12 :861-872, 1994; 
and Zisch et al., i - Cell Bio . 119:203-213, 1992), and 
the binding to smaller diffusible chemorepellents or 
chemoattractants, for example, DCC and netrin (Keino-Masu 

35 et al., Cell 87:175-185, 1996). 
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The specificity of DS-CAM expression for the 
central nervous system and the timing of its expression 
to the period of neurite outgrowth in both the central 
and peripheral nervous systems, indicates a role for 
5 DS-CAM in early development and differentiation (Examples 
4 and 5) . Early in development when, with the exception 
of neural crest precursors, expression is clearly absent 
from regions that contain dividing neuroepithelial 
precursors such as the ependymal layer of the neural tube 

10 and the ventricular zone of the brain (Altman and Bayer, 
Atlas of Prenatal Rat Bra in Development, CRC Press, Ann 
Arbor, MI, 1995). In the embryo, differentiated neurons 
express DS-CAM when they have finished migrating to their 
proper positions within the neuroepithelium, during 

15 neurite outgrowth. 

Neural crest cells may express DS-CAM while 
they are migrating. At 15.5 and 16.5 days pc, most of 
the neural crest derived tissues have some expression, 
although not all have finished migration. The continued 

20 expression of DS-CAM in the myenteric plexus after 15.5- 
16.5 dpc is due to the neural crest cells that have 
stopped dividing, although others are in the cell cycle. 
Approximately 50% of myenteric ganglia neurons arise 
after birth and DS-CAM may be expressed later in this 

25 subset. At later stages, the data suggest that DS-CAM is 
down regulated in the neural crest derivatives such as 
the myenteric ganglia and ganglia of the pancreas. The 
DS-CAM expression in tissues derived from the neural 
crest is of interest with respect to the high level 

30 detected in the umbilical cord. The tissue surrounding 
the umbilical artery and vein is derived from the neural 
crest and functions in coordinating the cardiovascular 
changes occurring at birth. The expression detected in 
the fetal liver and branchial arches is also derived from 

35 neural crest related to the ductus venosus and ultimately 
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the ductus arteriosus and cardiac outflow tracts, 
respectively. 

DS-CAM expression continues post-natally, in 
the differentiating regions of the newborn brain, such 
5 as, the septum and inferior colliculus, and in the adult 
in regions associated with plasticity, such as, the 
olfactory bulb and hippocampus. When combined with the 
evidence for involvement of the Ig superfamily in 
determining synaptic strength (Mayford et al . , Sqienpe 

10 256:638-644, 1992)), the continued expression supports a 
role for DS-CAM in remodeling, learning and memory. The 
expression pattern and the role of dendritic connections 
in cell body maintenance indicate that an increase in 
DS-CAM expression in DS brain is responsible in part for 

15 the abnormalities of dendritic structure and decreased 
intersections seen at four months post-natal in DS 
individuals . 

Alternatively spliced variants of CAMs have 
distinct roles in different parts of the brain, as 

20 demonstrated for closely related Ig- superfamily members, 
such as, NCAM (Cunningham et al . , Science 236:799-806, 
1987 and Figarella-Branger et al . , J. Neuropathpl . Exp r 
Neurol . 51:12-23, 1992). The differential expression of 
alternatively spliced DS-CAM transcripts encoding DS-CAM1 

25 (SEQ ID NO: 2) and DS-CAM2 (SEQ ID NO: 11) has likewise 
been observed in various parts of the human adult brain. 
For example, it has been found that DS-CAM clones 
encoding DS-CAM2 contain a small deletion relative to 
DS-CAM1, which deletion contains the transmembrane domain 

30 (Example 3 and Figure 3) and results in a stop codon 36 
bp downstream. The results of RT-PCR (Example 5) 
indicated that all RNAs tested from various human tissues 
expressed both the DS-CAM1 and DS-CAM2 transcripts and 
that the PCR products generated the sequence and size 

35 predicted for the appropriate form. The proximal and 
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distal borders of the deletion are located within 
neighboring exons and reveal variant consensus splice 
site sequences (Jackson, N^c. Acid Res. 19:3795-3798, 
1991) with further surrounding homology to the Ul 
5 spliceosome RNA . 

From Northern analyses (Example 4) a minimum of 
three distinct transcripts are recognized by a probe for 
the transmembrane domain. From cDNA sequence analyses 
(Example 5) two forms of the DS-CAM protein are deduced, 

10 one that generates a transmembrane adhesion molecule and 
a second that is deleted for the transmembrane domain, 
thereby generating a molecule that is transported to the 
extracellular matrix. This mode of generating 
extracellular and membrane bound forms of CAMs is in 

15 surprising contrast to the GPI 

(glycosylphosphatidylinositol) linkage used by most CAMs , 
and would provide a way of generating longer range 

homophilic interactions- -between -cells and the 

extracellular matrix, which may be significant for cell 

20 migration. 

The DS-CAM gene was isolated (as described in 
the Examples hereinafter) by using the BAC contig on 
21q22.2-q22.3 covering the region between D21S55 and MX1 
(Hubert et al . , Genomics 41:218-226, 1997). The gene 

25 spans a minimum of 900 kb, estimated by summing the size 
of BACs and PACs that are non-overlapping and covered by 
the DS-CAM gene (Figure 1) . The DS-CAM gene covers a gap 
in all physical maps of this region. From hybridization 
experiments indicating no signal of the complete cDNA to 

30 BAC 277G10 covering 210 kb, a 5' intron is at least this 
size, similar to the first intron of the DCC gene 
(Cho et al., Genomics 19:525-531, 1994). Alternatively, 
other alternative transcripts can contain exons located 
in this BAC. The gene spans the boundary of bands 
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21q22.2 and q22.3, a Giemsa-dark and Giemsa-light band, 
respectively. The location of the gene for PEP19, a 
small 634 bp gene with large introns within the same band 
21q22.2 (Cabin et al . , ^mat. Cell MoT , Genet. 22:167- 
5 175, 1996) suggests a general structure of genes in 6- 
bands having large introns. 

The nucleic acid molecules described herein are 
useful for producing invention DS-CAM proteins, when such 
nucleic acids are incorporated into a variety of protein 

10 expression systems known to those of skill in the art. 
In addition, such nucleic acid molecules or fragments 
thereof can be labeled with a readily detectable 
substituent and used as hybridization probes for assaying 
for the presence and/or amount of a DS-CAM gene or mRNA 

15 transcript in a given sample. The nucleic acid molecules 
described herein, and fragments thereof, are also useful 
as primers and/or templates in a PCR reaction for 

amplifying genes encoding the invention protein described 

herein. 

20 The term "nucleic acid" (also referred to as 

polynucleotides) encompasses ribonucleic acid (RNA) or 
deoxyribonucleic acid (DNA) , probes, oligonucleotides, 
and primers. DNA can be either complementary DNA (cDNA) 
or genomic DNA, e.g. a gene encoding a DS-CAM protein. 

25 One means of isolating a nucleic acid encoding a DS-CAM 
polypeptide is to probe a mammalian genomic library with 
a natural or artificially designed DNA probe using 
methods well known in the art. DNA probes derived from 
the DS-CAM gene are particularly useful for this purpose. 

30 DNA and cDNA molecules that encode DS-CAM polypeptides 
can be used to obtain complementary genomic DNA, cDNA or 
RNA from mammalian (e.g., human, mouse, rat, rabbit, pig, 
and the like) , or other animal sources, or to isolate 
related cDNA or genomic clones by the screening of cDNA 

35 or genomic libraries, by methods described in more detail 
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below. Examples of nucleic acids are RNA, cDNA, or 
isolated genomic DNA encoding a DS-CAM polypeptide. Such 
nucleic acids may include, but are not limited to, 
nucleic acids having substantially the same nucleotide 
5 sequence as set forth in SEQ ID N0:1, SEQ ID NO: 7, 
SEQ ID NO: 8, SEQ ID NO : 9 , SEQ ID NO: 10, or at least 
nucleotides 453-6185 set forth in SEQ ID NO:l, or 
nucleotides 453-5168 set forth in SEQ ID NO: 10. 

Use of the terms "isolated" and/or "purified" 
10 in the present specification and claims as a modifier of 
DNA, RNA, polypeptides or proteins means that the DNA, 
RNA, polypeptides or proteins so designated have been 
produced in such form by the hand of man, and thus are 
separated from their native in vivo cellular environment. 
15 As a result of this human intervention, the recombinant 
DNAs, RNAs, polypeptides and proteins of the invention 
are useful in ways described herein that the DNAs, RNAs, 
polypeptides, or proteins as they naturally^ occur are not . 

As used herein, "mammalian" refers to the 
20 variety of species from which the invention DS-CAM 
protein is derived, e.g., human, rat, mouse, rabbit, 
monkey, baboon, bovine, porcine, ovine, canine, feline, 
and the like. A preferred DS-CAM protein herein, is 
human DS-CAM. 

25 In one embodiment of the present invention, 

cDNAs encoding the invention DS-CAM proteins disclosed 
herein include substantially the same nucleotide sequence 
as set forth in SEQ ID N0:1, SEQ ID N0:7, SEQ ID N0:8, 
SEQ ID NO: 9, or SEQ ID NO: 10. Preferred cDNA molecules 

30 encoding the invention proteins include the same 

nucleotide sequence as nucleotides 453-6185 set forth in 
SEQ ID NO:l, or nucleotides 453-5168 set forth in 
SEQ ID NO: 10. 
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As employed herein, the term "substantially the 
same nucleotide sequence" refers to DNA having sufficient 
identity to the reference polynucleotide, such that it 
will hybridize to the reference nucleotide under 
5 moderately stringent hybridization conditions. In one 
embodiment, DNA having substantially the same nucleotide 
sequence as the reference nucleotide sequence encodes 
substantially the same amino acid sequence as that set 
forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 
10 coding region of SEQ ID NO: 7, SEQ ID NO: 8 or SEQ ID NO: 9, 
or a larger amino acid sequence including SEQ ID NO: 2 or 
SEQ ID NO: 11, or the DS-CAM coding region of SEQ ID NO: 7, 
SEQ ID NO: 8 or SEQ ID NO: 9. In another embodiment, DNA 
having "substantially the same nucleotide sequence" as 
15 the reference nucleotide sequence has at least 60% 
identity with respect to the reference nucleotide 
sequence. DNA having at least 70%, more preferably at 
least 90%, yet more preferably at least 95%, identity to 
_the-re.f.erence_nucle.Q J c.id(SL . sequence.. is_ preferred. 



20 This invention also encompasses nucleic acids 

which differ from the nucleic acids shown in SEQ ID NO:l, 
SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO : 9 , SEQ ID NO: 10 but 
which have the same phenotype. Phenotypically similar 
nucleic acids are also referred to as "functionally 

25 equivalent nucleic acids". As used herein, the phrase 
"functionally equivalent nucleic acids" encompasses 
nucleic acids characterized by slight and non- 
consequential sequence variations that will function in 
substantially the same manner to produce the same protein 

30 product (s) as the nucleic acids disclosed herein. In 
particular, functionally equivalent nucleic acids encode 
polypeptides that are the same as those disclosed herein 
or that have conservative amino acid variations, or that 
encode larger polypeptides that includes SEQ ID NO: 2 or 

35 SEQ ID NO: 11, or the DS-CAM coding region of SEQ ID NO: 7, 
SEQ ID NO: 8 or SEQ ID NO : 9 . For example, conservative 
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variations include substitution of a non-polar residue f 
with another non-polar residue, or substitution of a 
charged residue with a similarly charged residue. These 
variations include those recognized by skilled artisans 
5 as those that do not substantially alter the tertiary 
structure of the protein. 



Further provided are nucleic acids encoding 
DS-CAM polypeptides that, by virtue of the degeneracy of 
the genetic code, do not necessarily hybridize to the 

10 invention nucleic acids under specified hybridization 
conditions. Preferred nucleic acids encoding the 
invention polypeptides are comprised of nucleotides that 
encode substantially the same amino acid sequences set 
forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 

15 coding region of SEQ ID NO: 7, SEQ ID NO: 8 or SEQ ID NO: 9. 

Thus, an exemplary nucleic acid encoding an 
invention DS-CAM protein may be selected from : 

(a) DNA encoding the amino acid sequence 
set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or 

20 the DS-CAM coding region of SEQ ID NO: 7, 

SEQ ID NO: 8 or SEQ ID NO: 9, 

(b) DNA that hybridizes to the DNA of (a) 
under moderately stringent conditions, wherein 
said DNA encodes biologically active DS-CAM, or 

25 (c) DNA degenerate with respect to either 

(a) or (b) above, wherein said DNA encodes 
biologically active DS-CAM. 

Hybridization refers to the binding of 
complementary strands of nucleic acid (i.e., 
30 sense :antisense strands or probe : target -DNA) to each 

other through hydrogen bonds, similar to the bonds that 
naturally occur in chromosomal DNA. Stringency levels 
used to hybridize a given probe with target -DNA can be 
readily varied by those of skill in the art. 
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The phrase "stringent hybridization" is used 
herein to refer to conditions under which polynucleic 
acid hybrids are stable. As known to those of skill in 
the art, the stability of hybrids is reflected in the 
5 melting temperature (Tj of the hybrids. In general, the 
stability of a hybrid is a function of sodium ion 
concentration and temperature. Typically, the 
hybridization reaction is performed under conditions of 
lower stringency, followed by washes of varying, but 
10 higher, stringency. Reference to hybridization 
stringency relates to such washing conditions. 

As used herein, the phrase "moderately 
stringent hybridization" refers to conditions that permit 
target-DNA to bind a complementary nucleic acid that has 

15 about 60% identity, preferably about 75% identity, more 
preferably about 85% identity to the target DNA; with 
greater than about 90% identity to target-DNA being 

especially preferred Preferably, _ moderately stringent 

conditions are conditions equivalent to hybridization in 

20 50% formamide, 5X Denhardt 1 s solution, 5X SSPE, 0.2% SDS 
at 42°C, followed by washing in 0.2X SSPE, 0.2% SDS, at 
65°C. 

The phrase "high stringency hybridization" 
refers to conditions that permit hybridization of only 

25 those nucleic acid sequences that form stable hybrids in 
0.018M NaCl at 65°C (i.e., if a hybrid is not stable in 
0.018M NaCl at 65°C, it will not be stable under high 
stringency conditions, as contemplated herein) . High 
stringency conditions can be provided, for example, by 

30 hybridization in 50% formamide, 5X Denhardt 1 s solution, 
5X SSPE, 0.2% SDS at 42°C, followed by washing in 0.1X 
SSPE, and 0.1% SDS at 65°C. 

The phrase "low stringency hybridization" 
refers to conditions equivalent to hybridization in 10% 
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formamide, 5X Denhardt ' s solution, 6X SSPE, 0.2% SDS at 
42°C, followed by washing in IX SSPE, 0.2% SDS, at 50°C. 
Denhardt' s solution and SSPE (see, e.g., Sambrook et al . , 
Modular C loning. A T.ahoratory Manual Cold Spring 
5 Harbor Laboratory Press, 1989) are well known to those of 
skill in the art as are other suitable hybridization 
buffers . 

As used herein, the term "degenerate" refers to 
10 codons that differ in at least one nucleotide from a 

reference nucleic acid, e.g., SEQ ID NO:l, but encode the 
same amino acids as the reference nucleic acid. For 
example, codons specified by the triplets "UCU", "UCC", 
"UCA" , and "UCG" are degenerate with respect to each 
15 other since all four of these codons encode the amino 
acid serine. 

Preferred nucleic acids encoding the invention 

polypeptide (s) hybridize .under_,mo_de_rately stringent^ 

preferably high stringency, conditions to substantially 
20 the entire sequence, or in certain embodiments 

substantial portions (i.e., typically at least 15-30 
nucleotides) of the nucleic acid sequence set forth in 
SEQ ID NO:l, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9 or 
SEQ ID NO: 10. 

25 The invention nucleic acids can be produced by 

a variety of methods well-known in the art, e.g., the 
methods described herein, employing PCR amplification 
using oligonucleotide primers from various regions of 
SEQ ID NO:l, SEQ ID NO: 7, SEQ ID NO : 8 , SEQ ID NO: 9, 

30 SEQ ID NO: 10, and the like. 

In accordance with a further embodiment of the 
present invention, optionally labeled DS- CAM -encoding 
cDNAs, or fragments thereof, can be employed to probe 
library (ies) (e.g., cDNA, genomic, and the like) for 
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additional nucleic acid sequences encoding novel 
mammalian DS-CAM proteins. As described in Example 3, 
construction of mammalian cDNA libraries, preferably a 
human trisomy 21 fetal brain cDNA library, is well-known 
5 in the art. Screening of such a cDNA library is 

initially carried out under low-stringency conditions, 
which comprise a temperature of less than about 42°C, a 
formamide concentration of less than about 50%, and a 
moderate to low salt concentration. 

X0 Presently preferred probe-based screening 

conditions comprise a temperature of about 37°C, a 
formamide concentration of about 20%, and a salt 
concentration of about 5X standard saline citrate (SSC; 
20X SSC contains 3M sodium chloride, 0 . 3M sodium citrate, 

15 pH 7.0). Such conditions will allow the identification 
of sequences which have a substantial degree of 
similarity with the probe sequence, without requiring 
- perfect homology. -The. phrase "substantial similarity" , _ 
refers to sequences which share at least 50% homology. 

20 Preferably, hybridization conditions will be selected 
which allow the identification of sequences having at 
least 70% homology with the probe, while discriminating 
against sequences which have a lower degree of homology 
with the probe. As a result, nucleic acids having 

25 substantially the same nucleotide sequence as nucleotides 
453-6185 set forth in SEQ ID NO:l, or nucleotides 
453-5168 set forth in SEQ ID NO:10, SEQ ID NO:7, 
SEQ ID NO: 8, or SEQ ID NO: 9 are obtained. 

As used herein, a nucleic acid "probe" is 
30 single-stranded DNA or RNA, or analogs thereof, that has 
a sequence of nucleotides that includes at least 14, at 
least 20, at least 50, at least 100, at least 200, at 
least 300, at least 400, or at least 500 contiguous bases 
that are the same as (or the complement of) any 
35 contiguous bases set forth in any of SEQ ID NO:l, 
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SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO : 9 or SEQ ID NO:10. 
Preferred regions from which to construct probes include 
5' and/or 3' coding regions of SEQ ID NO:l, SEQ ID NO: 7, 
SEQ ID NO: 8, SEQ ID NO: 9 or SEQ ID NO: 10. In addition, 
5 the entire cDNA encoding region of an invention DS-CAM 
protein, or the entire sequence corresponding to SEQ ID 
NO:l, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9 or SEQ ID 
NO: 10, may be used as a probe. Probes may be labeled by 
methods well-known in the art, as described hereinafter, 
10 and used in various diagnostic kits. 

As used herein, the terms "label" and 
"indicating means" in their various grammatical forms 
refer to single atoms and molecules that are either 
directly or indirectly involved in the production of a 
15 detectable signal. Any label or indicating means can be 
linked to invention nucleic acid probes, expressed 
proteins, polypeptide fragments, or antibody molecules. 

These- atoms or- molecules -can- be -used alone, or. in ... 

conjunction with additional reagents. Such labels are 
20 themselves well-known in clinical diagnostic chemistry. 

The labeling means can be a fluorescent 
labeling agent that chemically binds to antibodies or 
antigens without denaturation to form a fluorochrome 
(dye) that is a useful immunof luorescent tracer. A 
25 description of immunof luorescent analytic techniques is 
found in DeLuca, "Immunofluorescence Analysis" , in 
Antibody As a Tool , Marchalonis et al . , eds . , John Wiley 
U Sons, Ltd., pp. 189-231, 1982, which is incorporated 
herein by reference. 

30 In one embodiment, the indicating group is an 

enzyme, such as horseradish peroxidase (HRP) , glucose 
oxidase, and the like. In another embodiment, 
radioactive elements are employed labeling agents. The 
linking of a label to a substrate, i.e., labeling of 



WO 98/17795 



PCT/US97/19547 



20 

nucleic acid probes, antibodies, polypeptides, and 
proteins, is well known in the art. For instance, an 
invention antibody can be labeled by metabolic 
incorporation of radiolabeled amino acids provided in the 
5 culture medium. See, for example, Galfre et al . , Meih. 
Enzymol . 73:3-46, 1981. Conventional means of protein 
conjugation or coupling by activated functional groups 
are particularly applicable. See, for example, Aurameas 
et al., firsnri. J. Immunol . 8(7): 7-23, 1978; Rodwell et 
10 al., Biotech . 3:889-894, 1984; and U.S. Patent 
No. 4,493,795. 

In accordance with another embodiment of the 
present invention, there are provided isolated mammalian 
DS-CAM proteins (preferably human), polypeptides, and 

15 fragments thereof encoded by invention nucleic acid. 

Preferably, DS-CAM proteins referred to herein, are those 
polypeptides specif ically recognized_by an ant^jody^ that 
also specifically recognizes a DS-CAM protein including 
the sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or 

20 the DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO: 8 or 
SEQ ID NO: 9. Invention isolated DS-CAM proteins are free 
of cellular components and/or contaminants normally 
associated with a native in vivo environment. 

The invention DS-CAM proteins are further 
25 characterized as being primarily expressed in fetal brain 
and not expressed in fetal lung or fetal liver. For 
example, the results of Northern analysis (described in 
Example 4) using human fetal tissues showed that 8.5 kb 
and 7.6 kb transcripts are expressed only in fetal brain 
30 and not expressed in fetal lung, fetal liver and fetal 
kidney. Northern blot analyses of adult tissues 
revealed differential expression of three alternative 
transcripts of 9.7 kb, 8.5 kb and 7.6 kb in different 
substructures of the brain. The 9 . 7 kb transcript is 
35 highly expressed in the substantia nigra, moderately 
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expressed in the amygdala and hippocampus, and less 
expressed in the whole brain. A similar pattern is 
observed by using a PCR product spanning the 191 bp 
deletion found in DS-CAM-18 and DS-CAM-52. The placenta 
5 shows faint bands, and the sizes are smaller than those 
in brain. In skeletal muscle, a faint band (6.5 kb) is 
detected. 

The results of RT-PCR (Example 5) demonstrated 
expression of human DS-CAM mRNA in fetal and adult brain, 
10 in fetal kidney, as well as in a breast carcinoma cell 
line mRNA. Thus, splice variant cDNA transcripts 
encoding a DS-CAM family of proteins are clearly 
contemplated by the present invention. 

The region of chromosome locus 21q22.2 from 

15 which DS-CAM is derived is part of the candidate region 
for holoprosencephaly type I (HPE1) . In addition, some 

patients with this region hemizygously deleted show 

abnormalities of the corpus callosum and schizencephaly . 
Therefore, DS-CAM is contemplated as the gene, which when 

20 defective, deleted or present as a duplication, is 

responsible for holoprosencephaly, agenesis of the corpus 
callosum and/or structural defects of the brain. In 
addition, DS-CAM may also be responsible for several 
phenotypes of Down Syndrome including mental retardation 

25 as well as, more specifically, the abnormal dendritic 
structure observed in Down Syndrome. Additional roles 
for DS-CAM were further evaluated by database homology 
searches using BLAST X/N and TIGR database analyses. 
Results of these searches indicate that DS-CAM shows 

30 moderate homology to N-CAM-1 (Cunningham et al . , Sc i ence , 
236:799-806, 1987) and to DCC (Fearon et al., Science , 

247:49-56, 1990). 



Presently preferred DS-CAM proteins of the 
invention include amino acid sequences that are 
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substantially the same as the protein sequence set forth 
in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM coding 
region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9, as 
well as biologically active, modified forms thereof. 
5 Those of skill in the art will recognize that numerous 
residues of the above-described sequences can be 
substituted with other, chemically, sterically and/or 
electronically similar residues without substantially 
altering the biological activity of the resulting 
10 receptor species. In addition, larger or smaller 

polypeptide sequences containing substantially the same 
sequence as SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 
coding region of SEQ ID NO : 7 , SEQ ID NO: 8 or SEQ ID NO: 9, 
therein (e.g., splice variants) are contemplated. 

15 

As employed herein, the term " substantially the 
same amino acid sequence" refers to amino acid sequences 
having at least about 50%, preferably at least about 60%, 
more pref erably -at -least- about -70-% -ident lty-W.ith._r_e spec t 

20 to the reference amino acid sequence, and retaining 
comparable functional and biological activity 
characteristic of the protein defined by the reference 
amino acid sequence. In another embodiment of the 
invention, preferred invention proteins having 

25 "substantially the same amino acid sequence" will have at 
least about 80%, more preferably 90% amino acid identity 
with respect to the reference amino acid sequence; with 
greater than about 95% amino acid sequence identity being 
especially preferred. It is recognized, however, that 

30 polypeptides (or nucleic acids referred to hereinbefore) 
containing less than the described levels of sequence 
identity arising as splice variants or that are modified 
by conservative amino acid substitutions, or by 
substitution of degenerate codons are also encompassed 

35 within the scope of the present invention. 
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The term "biologically active" or "functional", 
when used herein as a modifier of invention DS-CAM 
protein(s), or polypeptide fragment thereof, refers to a 
polypeptide that exhibits functional characteristics 
5 similar to DS-CAM. For example, one biological activity 
of DS-CAM is the ability to act as an immunogen for the 
production of polyclonal and monoclonal antibodies that 
bind specifically to DS-CAM. Thus, an invention nucleic 
acid encoding DS-CAM will encode a polypeptide 
10 specifically recognized by an antibody that also 

specifically recognizes the DS-CAM protein including the 
sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the 
DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO: 8 or 
SEQ ID NO: 9. Such activity may be assayed by any method 
15 known to those of skill in the art. For example, a 

test-polypeptide encoded by a DS-CAM cDNA can be used to 
produce antibodies, which are then assayed for their 
ability to bind to the protein including the sequence set 

forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM 

2.0 coding region of SEQ ID NO: 7, SEQ ID NO: 8 or SEQ ID NO: 9. 
If the antibody binds to the test-polypeptide and the 
protein including the sequence set forth in SEQ ID NO: 2 
or SEQ ID NO: 11, or the DS-CAM coding region of 
SEQ ID NO:7, SEQ ID NO:8 or SEQ ID NO : 9 with 
25 substantially the same affinity, then the polypeptide 
possesses the requisite biological activity. 

The invention DS-CAM proteins can be isolated 
by a variety of methods well-known in the art, e.g., the 
methods described herein, the recombinant expression 

30 systems described herein, precipitation, gel filtration, 
ion-exchange, reverse-phase and affinity chromatography, 
and the like. Other well-known methods are described in 
Deutscher et al., Suids to Proffin Purification; Methods 
in Engymoloav 182 (Academic Press, 1990), which is 

35 incorporated herein by reference. Alternatively, the 
isolated polypeptides of the present invention can be 
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obtained using well-known recombinant methods as 
described, for example, in Sambrook et al., supra ■ , 

1989) . 

An example of the means for preparing the 
5 invention polypeptide (s) is to express nucleic acids 
encoding the DS-CAM in a suitable host cell, such as a 
bacterial cell, a yeast cell, an amphibian cell (i.e., 
oocyte), or a mammalian cell, using methods well known in 
the art, and recovering the expressed polypeptide, again 

10 using well-known methods. Invention polypeptides can be 
isolated directly from cells that have been transformed 
with expression vectors as described below herein. The 
invention polypeptide, biologically active fragments, and 
functional equivalents thereof can also be produced by 

15 chemical synthesis. For example, synthetic polypeptides 
can be produced using Applied Biosystems, Inc. Model 430A 
or 431A automatic peptide synthesizer (Foster City, CA) 
" ~emp^oying~the^ manufacturer;- 



20 



The present invention also provides 
compositions containing an acceptable carrier and any of 
an isolated, purified DS-CAM polypeptide, an active 
fragment thereof, or a purified, mature protein and 
active fragments thereof, alone or in combination with 
each other. These polypeptides or proteins can be 
25 recombinantly derived, chemically synthesized or purified 
from native sources. As used herein, the term 
"acceptable carrier" encompasses any of the standard 
pharmaceutical carriers, such as phosphate buffered 
saline solution, water and emulsions such as an oil/water 
or water/oil emulsion, and various types of wetting 
agents . 



30 



Also provided are antisense oligonucleotides 
having a sequence capable of binding specifically with 
any portion of an mRNA that encodes DS-CAM polypeptides 
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so as to prevent translation of the mRNA. The antisense 
oligonucleotide may have a sequence capable of binding 
specifically with any portion of the sequence of the cDNA 
encoding DS-CAM polypeptides. As used herein, the phrase 
5 "binding specifically" encompasses the ability of a 
nucleic acid sequence to recognize a complementary 
nucleic acid sequence and to form double-helical segments 
therewith via the formation of hydrogen bonds between the 
complementary base pairs. An example of an antisense 
10 oligonucleotide is an antisense oligonucleotide 
comprising chemical analogs of nucleotides. 

Compositions comprising an amount of the 
antisense oligonucleotide, described above, effective to 
reduce expression of DS-CAM polypeptides by passing 

15 through a cell membrane and binding specifically with 
mRNA encoding DS-CAM polypeptides so as to prevent 
translation and an acceptable hydrophobic carrier capable 

- " of passing " through a cell membrane are- also -provided 

herein. Suitable hydrophobic carriers are described, for 

20 example, in U.S. Patent Nos . 5,334,761; 4,889,953; 
4,8 97,3 55, and the like. The acceptable hydrophobic 
carrier capable of passing through cell membranes may 
also comprise a structure which binds to a receptor 
specific for a selected cell type and is thereby taken up 

25 by cells of the selected cell type. The structure may be 
part of a protein known to bind to a cell-type specific 
receptor. 

Antisense oligonucleotide compositions are 
useful to inhibit translation of mRNA encoding invention 

30 polypeptides. Synthetic oligonucleotides, or other 
antisense chemical structures are designed to bind to 
mRNA encoding DS-CAM polypeptides and inhibit translation 
of mRNA and are useful as compositions to inhibit 
expression of DS-CAM associated genes in a tissue sample 

35 or in a subject. 
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In accordance with another embodiment of the 
invention, kits for detecting mutations, duplications, 
deletions, rearrangements and aneuploidies in chromosome 
21 at locus q22.2 comprising at least one invention probe 
5 or antisense nucleotide. 

The present invention provides means to 
modulate levels of expression of DS-CAM polypeptides by 
employing synthetic antisense oligonucleotide 
compositions (hereinafter SAOC) which inhibit translation 

10 of mRNA encoding these polypeptides. Synthetic 

oligonucleotides, or other antisense chemical structures 
designed to recognize and selectively bind to mRNA, are 
constructed to be complementary to portions of the DS-CAM 
coding strand or nucleotide sequences shown in 

15 SEQ ID NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , SEQ ID NO: 9 or 
SEQ ID NO: 10. The SAOC is designed to be stable in the 
blood stream for administration to a subject by 

injection, or in -laboratory cell culture- conditions- —The - 

SAOC is designed to be capable of passing through the 

20 cell membrane in order to enter the cytoplasm of the cell 
by virtue of physical and chemical properties of the SAOC 
which render it capable of passing through cell 
membranes, for example, by designing small, hydrophobic 
SAOC chemical structures, or by virtue of specific 

25 transport systems in the cell which recognize and 

transport the SAOC into the cell. In addition, the SAOC 
can be designed for administration only to certain 
selected cell populations by targeting the SAOC to be 
recognized by specific cellular uptake mechanisms which 

30 bind and take up the SAOC only within select cell 
populations. 

For example, the SAOC may be designed to bind 
to a receptor found only in a certain cell type, as 
discussed supra . The SAOC is also designed to recognize 
35 and selectively bind to target mRNA sequence, which may 
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correspond to a sequence contained within the sequence 
shown in SEQ ID NO:l, SEQ ID N0:7, SEQ ID NO : 8 , 

SEQ ID NO: 9 or SEQ ID NO: 10. The SAOC is designed to 
inactivate target mRNA sequence by either binding thereto 
5 and inducing degradation of the mRNA by, for example, 
RNase I digestion, or inhibiting translation of mRNA 
target sequence by interfering with the binding of 
translation-regulating factors or ribosomes, or inclusion 
of other chemical structures, such as ribozyme sequences 

10 or reactive chemical groups which either degrade or 

chemically modify the target mRNA. SAOCs have been shown 
to be capable of such properties when directed against 
mRNA targets (see Cohen et al., TIPS 10:435, 1989 and 
Weintraub, £cj - American January 19 90, pp.40; both 

15 incorporated herein by reference) . 

In accordance with yet another embodiment of 
the present invention, there is provided a method for the 
recombinant" production of Invent ion ^DS-CAM~pfote"in ( s )~ "by ~ 
expressing the above-described nucleic acid sequences in 

20 suitable host cells. Recombinant DNA expression systems 
that are suitable to produce DS-CAM proteins described 
herein are well-known in the art. For example, the 
above -described nucleotide sequences can be incorporated 
into vectors for further manipulation. As used herein, 

25 vector (or plasmid) refers to discrete elements that are 
used to introduce heterologous DNA into cells for either 
expression or replication thereof. 

Suitable expression vectors are well-known in 
the art, and include vectors capable of expressing DNA 

30 operatively linked to a regulatory sequence, such as a 
promoter region that is capable of regulating expression 
of such DNA. Thus, an expression vector refers to a 
recombinant DNA or RNA construct, such as a plasmid, a 
phage, recombinant virus or other vector that, upon 

35 introduction into an appropriate host cell, results in 
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expression of the inserted DNA. Appropriate expression 
vectors are well known to those of skill in the art and 
include those that are replicable in eukaryotic cells 
and/or prokaryotic cells and those that remain episomal 
5 or those which integrate into the host cell genome. 

As used herein, a promoter region refers to a 
segment of DNA that controls transcription of DNA to 
which it is operatively linked. The promoter region 
includes specific sequences that are sufficient for RNA 

10 polymerase recognition, binding and transcription 

initiation. In addition, the promoter region includes 
sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. 
These sequences may be cis acting or may be responsive to 

15 trans acting factors. Promoters, depending upon the 
nature of the regulation, may be constitutive or 
regulated^ Exemplary promoters contemplated for use in 
the practice of the present invention include the SV40 
early promoter, the cytomegalovirus (CMV) promoter, the 

20 mouse mammary tumor virus (MMTV) steroid- inducible 

promoter, Moloney murine leukemia virus (MMLV) promoter, 
and the like. 

As used herein, the term "operatively linked" 
refers to the functional relationship of DNA with 

25 regulatory and effector nucleotide sequences, such as 
promoters, enhancers, transcriptional and translational 
stop sites, and other signal sequences. For example, 
operative linkage of DNA to a promoter refers to the 
physical and functional relationship between the DNA and 

30 the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that 
specifically recognizes, binds to and transcribes the 
DNA. 
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As used herein, expression refers to the 
process well-known to those of skill in the art by which 
polynucleic acids are transcribed into mRNA and 
translated into peptides or proteins and, optionally 
5' thereafter, modified post- translationally . If the 
invention nucleic acid is derived from genomic DNA, 
expression may, if an appropriate eukaryotic host cell or 
organism is selected, include splicing of the mRNA. 

Prokaryotic transformation vectors are 
10 well-known in the art and include pBluescript and phage 
Lambda ZAP vectors (STRATAGENE, San Diego, CA) , and the 
like. Other suitable vectors and promoters are disclosed 
in detail in U.S. Patent No. 4,798,885, issued January 
17, 198S, the disclosure of which is incorporated herein 
15 by reference in its entirety. 

Other suitable vectors for transformation of 

E. col I" cell s include the" pET -expression vectors 

(Novagen, see U.S patent 4,952,496), e.g., pETlla, which 
contains the T7 promoter, T7 terminator, the inducible 
20 E. coli lac operator, and the lac repressor gene; and pET 
12a-c, which contain the T7 promoter, T7 terminator, and 
the E. coli ompT secretion signal. Another suitable 
vector is the pIN-IIIompA2 (see Duffaud et al . , Meth, in 
Enzymolocrv . 153:492-507, 1987), which contains the lpp 
25 promoter, the lacUVS promoter operator, the ompA 
secretion signal, and the lac repressor gene. 

Exemplary, eukaryotic transformation vectors, 
include the cloned bovine papilloma virus genome, the 
30 cloned genomes of the murine retroviruses, and eukaryotic 
cassettes, such as the pSV-2 gpt system (described by 
Mulligan and Berg, Mature 277:108-114, 1979) the 
Okayama-Berg cloning system (MoJ , CeU Biol- 2:161-170, 
1982) , and the expression cloning vector described by 
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Genetics Institute (S£i£IX££ 228 : 810-815, 1985), are 
available which provide substantial assurance of at least 
some expression of the protein of interest in the 
transformed eukaryotic cell line. 

5 Particularly preferred base vectors which 

contain regulatory elements that can be linked to the 
invention DS - CAM- encoding DNAs for transfection of 
mammalian cells are cytomegalovirus (CMV) promoter-based 
vectors such as pcDNAl (Invitrogen, San Diego, CA) , MMTV 
10 promoter-based vectors such as pMAMNeo (Clontech, Palo 
Alto, CA) and pMSG (Pharmacia, Piscataway, NJ) , and SV40 
promoter-based vectors such as pSVp (Clontech, Palo Alto, 
CA) . 

In accordance with another embodiment of the 
15 present invention, there are provided "recombinant cells" 
containing the nucleic acid molecules (i.e., DNA or mRNA) 
of the present invention. "Methods of transforming 
suitable host cells, preferably bacterial cells, and more 
preferably E. coli cells, as well as methods applicable 
20 for culturing said cells containing a gene encoding a 
heterologous protein, are generally known in the art. 
See, for example, Sambrook et al . , gupra, 1989. 

Exemplary methods of transformation include, 
e.g., transformation employing plasmids, viral, or 

25 bacterial phage vectors, transfection, electroporation, 
lipofection, and the like. The heterologous DNA can 
optionally include sequences which allow for its 
extrachromosomal maintenance, or said heterologous DNA 
can be caused to integrate into the genome of the host 

3 0 (as an alternative means to ensure stable maintenance in 
the host) . 



Host organisms contemplated for use in the 
practice of the present invention include those organisms 
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in which recombinant production of heterologous proteins 
has been carried out. Exemplary cells for introducing 
DNA include cells of mammalian origin (e.g., COS cells, 
mouse L cells, Chinese hamster ovary (CHO) cells, human 
5 embryonic kidney (HEK) cells, African green monkey cells 
and other such cells known to those of skill in the art) , 
amphibian cells (e.g., Xenopus laevis oocytes), yeast 
cells (e.g., Saccharomyces cerevisiae, Candida 
tropicalis, Hansenula polymorpha and P. pas tor is; see, 
10 e.g., U.S. Patent Nos . 4,882,279, 4,837,148, 4,929,555 
and 4,855,231), bacteria (e.g., E. coli) , and the like. 

In one embodiment, nucleic acids encoding the 
invention DS-CAM proteins can be delivered into mammalian 
cells, either in vivo or in vitrQ using suitable viral 

15 vectors well-known in the art. Suitable retroviral 

vectors, designed specifically for in vivo "gene therapy" 

methods, . jure described^ for example, in WIPO _PubljLcations_ 

WO 9205266 and WO 9214829, which provide a description of 
methods for efficiently introducing nucleic acids into 

20 human cells -in vivo . In addition, where it is desirable 
to limit or reduce the in vlvp expression of the 
invention DS-CAM, the introduction of the antisense 
strand of the invention nucleic acid is contemplated. 

In accordance with yet another embodiment of 
25 the present invention, there are provided anti-DS-CAM 
antibodies having specific reactivity with DS-CAM 
polypeptides of the present invention. Active fragments 
of antibodies are encompassed within the definition of 
"antibody". Invention antibodies can be produced by 
30 methods known in the art using invention polypeptides, 
proteins or portions thereof as antigens. For example, 
polyclonal and monoclonal antibodies can be produced by 
methods well known in the art, as described, for example, 
in Harlow and Lane, Antibodies: A Laboratory Manual (Cold 
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Spring Harbor Laboratory, 1988) , which is incorporated 
herein by reference. Invention polypeptides can be used 
as immunogens in generating such antibodies. 
Alternatively, synthetic peptides can be prepared (using 
5 commercially available synthesizers) and used as 

immunogens. Amino acid sequences can be analyzed by 
methods well known in the art to determine whether they 
encode hydrophobic or hydrophilic domains of the 
corresponding polypeptide. Altered antibodies such as 

10 chimeric, humanized, CDR-grafted or bifunctional 

antibodies can also be produced by methods well known in 
the art. Such antibodies can also be produced by 
hybridoma, chemical synthesis or recombinant methods 
described, for example, in Sambrook et al . , £JUE£&, 1989; 

15 and Harlow and Lane, supra , 1988. Both anti-peptide and 
anti- fusion protein antibodies can be used, (see, for 
example, Bahouth et al . , Trends Pharmacol. Sci. • 12:338 
1991; Ausube] et al . , Current P rotocols in Molecular 
" BTolboy " ( John Wiley ~and"~S~ons~, NY 1989) which ~are - 

20 incorporated herein by reference) . 

Antibody so produced can be used, inter alifr, 
in diagnostic methods and systems to detect the level of 
DS-CAM protein present in a mammalian, preferably human, 
body sample, such as tissue or vascular fluid. Such 

25 antibodies can also be used for the immunoaf f inity or 
affinity chromatography purification of the invention 
DS-CAM protein. In addition, methods are contemplated 
herein for detecting the presence of DS-CAM polypeptides 
on the surface of a cell comprising contacting the cell 

30 with an antibody that specifically binds to DS-CAM 

polypeptides, under conditions permitting binding of the 
antibody to the polypeptides, detecting the presence of 
the antibody bound to the cell, and thereby detecting the 
presence of invention polypeptides on the surface of the 

35 cell. With respect to the detection of such 
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polypeptides, the antibodies can be used for in vitro 
diagnostic or in vivo imaging methods. 

Immunological procedures useful for in vitro 
detection of target DS-CAM polypeptides in a sample 
5 include immunoassays that employ a detectable antibody. 
Such immunoassays include, for example, ELISA, Pandex 
microf luorimetric assay, agglutination assays, flow 
cytometry, serum diagnostic assays and 
immunohistochemical staining procedures which are well 

10 known in the art. An antibody can be made detectable by 
various means well known in the art. For example, a 
detectable marker can be directly or indirectly attached 
to the antibody. Useful markers include, for example, 
radionucleotides, enzymes, fluorogens, chromogens and 

15 chemiluminescent labels. 

Invention anti-DS-CAM antibodies are 
• - contemplated for use herein- to modulate the activity of — 
the DS-CAM polypeptide in living animals, in humans, or 
in biological tissues or fluids isolated therefrom. 

20 Accordingly, compositions comprising a carrier and an 
amount of an antibody having specificity for DS-CAM 
polypeptides effective to block naturally occurring 
ligands or other DS-CAM-binding proteins from binding to 
invention DS-CAM polypeptides are contemplated herein. 

25 For example, a monoclonal antibody directed to an epitope 
of DS-CAM polypeptide molecules present on the surface of 
a cell and having an amino acid sequence substantially 
the same as an amino acid sequence for a cell surface 
epitope of a DS-CAM polypeptide including the amino acid 

30 sequence shown in SEQ ID NO: 2 or SEQ ID NO: 11, or the 
DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO: 8 or 
SEQ ID NO: 9, can be useful for this purpose. 

The present invention further provides 
transgenic non-human mammals that are capable of 
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expressing exogenous nucleic acids encoding DS-CAM 
polypeptides. As employed herein, the phrase "exogenous 
nucleic acid" refers to nucleic acid sequence which is 
not native to the host, or which is present in the host 
5 in other than its native environment (e.g., as part of a 
genetically engineered DNA construct) . 

Also provided are transgenic non-human mammals 
capable of expressing nucleic acids encoding DS-CAM 
polypeptides so mutated as to be incapable of normal 

10 activity, i.e., do not express native DS-CAM . The 
present invention also provides transgenic non-human 
mammals having a genome comprising antisense nucleic 
acids complementary to nucleic acids encoding DS-CAM 
polypeptides, placed so as to be transcribed into 

15 antisense mRNA complementary to mRNA encoding DS-CAM 

polypeptides, which hybridizes to the mRNA and, thereby, 
reduces the translation thereof. The nucleic acid may 

additionally ..comprise an_ inducible promoter and/or tissue 

specific regulatory elements, so that expression can be 

20 induced, or restricted to specific cell types. Examples 
of nucleic acids are DNA or cDNA having a coding sequence 
substantially the same as the coding sequence shown in 
SEQ ID NO:l. An example of a non-human transgenic mammal 
is a transgenic mouse. Examples of tissue specificity- 

25 determining elements are the metallothionein promoter and 
the L7 promoter. 

Animal model systems which elucidate the 
physiological and behavioral roles of DS-CAM polypeptides 
are also provided, and are produced by creating 

30 transgenic animals in which the expression of the DS-CAM 
polypeptide is altered using a variety of techniques. 
Examples of such techniques include the insertion of 
normal or mutant versions of nucleic acids encoding a 
DS-CAM polypeptide by microinjection, retroviral 

35 infection or other means well known to those skilled in 
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the art, into appropriate fertilized embryos to produce a 
transgenic animal. See, for example, Hogan et al., 
Mani pulating the Mo u ^ Embrvo : A T laboratory Manila! (Cold 
Spring Harbor Laboratory, 1986) . 

5 Also contemplated herein, is the use of 

homologous recombination of mutant or normal versions of 
DS-CAM genes with the native gene locus in transgenic 
animals, to alter the regulation of expression or the 
structure of DS-CAM polypeptides (see, Capecchi et al . , 

10 Science 244:1288, 1989; Zimmer et al . , Nature 338:150, 
1989; which are incorporated herein by reference) . 
Homologous recombination techniques are well known in the 
art. Homologous recombination replaces the native 
(endogenous) gene with a recombinant or mutated gene to 

15 produce an animal that cannot express native (endogenous) 
protein but can express, for example, a mutated protein 
which results in altered expression of DS-CAM 

polypeptides . - — 

In contrast to homologous recombination, 
20 microinjection adds genes to the host genome, without 
removing host genes. Microinjection can produce a 
transgenic animal that is capable of expressing both 
endogenous and exogenous DS-CAM protein. Inducible 
promoters can be linked to the coding region of nucleic 
25 acids to provide a means to regulate expression of the 
transgene. Tissue specific regulatory elements can be 
linked to the coding region to permit tissue-specific 
expression of the transgene. Transgenic animal model 
systems are useful for in vivo screening of compounds for 
30 identification of specific ligands, i.e., agonists and 
antagonists, which activate or inhibit protein responses. 

Invention nucleic acids, oligonucleotides 
(including antisense) , vectors containing same, 
transformed host cells, polypeptides and combinations 
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thereof, as well as antibodies of the present invention, 
can be used to screen compounds in vitro to determine 
whether a compound functions as a potential agonist or 
antagonist to invention polypeptides. These in vitro 
5 screening assays provide information regarding the 

function and activity of invention polypeptides, which 
can lead to the identification and design of compounds 
that are capable of specific interaction with one or more 
types of polypeptides, peptides or proteins. 



10 In accordance with still another embodiment of 

the present invention, there is provided a method for 
identifying compounds which bind to DS-CAM polypeptides. 
The invention proteins may be employed in a competitive 
binding assay. Such an assay can accommodate the rapid 

15 screening of a large number of compounds to determine 

which compounds, if any, are capable of binding to DS-CAM 
proteins. Subsequently, more detailed assays can be 
carried .out. with those compounds found to bind, to 
further determine whether such compounds act as 

20 modulators, agonists or antagonists of invention 
proteins . 



Another application of the binding assay of the 
invention is the assay of test samples (e.g., biological 
fluids) for the presence or absence of DS-CAM. Thus, for 
25 example, serum from a patient displaying symptoms thought 
to be related to over- or under-production of DS-CAM can 
be assayed to determine if the observed symptoms are 
indeed caused by over- or under-production of DS-CAM. 

In another embodiment of the invention, there 
30 is provided a bioassay for identifying compounds which 
modulate the activity of invention DS-CAM polypeptides. 
According to this method, invention polypeptides are 
contacted with an "unknown" or test substance (in the 
presence of a reporter gene construct when antagonist 
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activity is tested) , the activity of the polypeptide is 
monitored subsequent to the contact with the "unknown" or 
test substance, and those substances which cause the 
reporter gene construct to be expressed are identified as 
5 functional ligands for DS-CAM polypeptides. 

In accordance with another embodiment of the 
present invention, transformed host cells that 
recombinantly express invention polypeptides can be 
contacted with a test compound, and the modulating 

10 effect (s) thereof can then be evaluated by comparing the 
DS-CAM-mediated response (e.g., via reporter gene 
expression) in the presence and absence of test compound, 
or by comparing the response of test cells or control 
cells (i.e., cells that do not express DS-CAM 

15 polypeptides), to the presence of the compound. 

As used herein, a compound or a signal that 
___^modulates_the„a of invention polypeptides refers 

to a compound or a signal that alters the activity of 
DS-CAM polypeptides so that the activity of the invention 

20 polypeptide is different in the presence of the compound 
or signal than in the absence of the compound or signal . 
In particular, such compounds or signals include agonists 
and antagonists. An agonist encompasses a compound or a 
signal that activates DS-CAM protein expression. 

25 Alternatively, an antagonist includes a compound or 
signal that interferes with DS-CAM protein expression. 
Typically, the effect of an antagonist is observed as a 
blocking of agonist -induced protein activation. 
Antagonists include competitive and non- competitive 

30 antagonists. A competitive antagonist (or competitive 
blocker) interacts with or near the site specific for 
agonist binding. A non -competitive antagonist or blocker 
inactivates the function of the polypeptide by 
interacting with a site other than the agonist 

35 interaction site. 
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As understood by those of skill in the art, 
assay methods for identifying compounds that modulate 
DS-CAM activity generally require comparison to a 
control. One type of a "control" is a cell or culture 
5 that is treated substantially the same as the test cell 
or test culture exposed to the compound, with the 
distinction that the "control" cell or culture is not 
exposed to the compound. For example, in methods that 
use voltage clamp electrophysiological procedures, the 

10 same cell can be tested in the presence or absence of 
compound, by merely changing the external solution 
bathing the cell. Another type of "control" cell or 
culture may be a cell or culture that is identical to the 
transfected cells, with the exception that the "control" 

15 cell or culture do not express native proteins. 

Accordingly, the response of the transfected cell to 
compound is compared to the response (or lack thereof) of 
the "control" cell or culture to the same compound under 
the s_ame_ reaction cor^ditions. _ 

20 Since it is well-known that CAMs interact with 

• extracellular ligands, it is contemplated that invention 
DS-CAM proteins interact with extracellular ligands. In 
another embodiment of the present invention, it is 
contemplated that invention DS-CAM proteins act 

25 specifically in concert or in competition with other 

CAMs. Thus, the present invention contemplates various 
bioassays for identifying ligands for invention DS-CAM 
proteins. In addition, the present invention 
contemplates an assay measuring the effect of 

30 co-expressing during development either normal or 

defective invention DS-CAMs with other CAMs known in the 
art to assess the resulting phenotype. 



In one embodiment of the present invention, 
35 there is provided a bioassay for evaluating whether test 
compounds are capable of acting as agonists comprises: 
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(a) culturing cells containing: 

DNA which expresses DS-CAM 
protein (s) or functional modified 
forms thereof, and 
5 DNA encoding a reporter protein, 

wherein said DNA is operatively 
linked to a DS-CAM responsive 
transcription element; 
wherein said culturing is carried out in 
3_0 the presence of at least one compound 

whose ability to induce signal 
transduction activity of DS-CAM protein is 
sought to be determined, and thereafter 

(b) monitoring said cells for expression of 
15 said reporter protein. 



In another embodiment of the present invention, 
the bioassay for evaluating whether test compounds are 

capable.. .of- a ct jl ng _ a s _an t agonist s_ fp r DS^ CAM jp rot e i n ( s ) o f 

the invention, or functional modified forms of .said 
20 DS-CAM protein (s) , comprises: 

(a) culturing cells containing: 

DNA which expresses DS-CAM 
protein (s), or functional modified 
forms thereof, and 
25 DNA encoding a reporter protein, 

wherein said DNA is operatively 
linked to a DS-CAM responsive 
transcription element 
wherein said culturing is carried out in 
30 the presence of: 

increasing concentrations of at 
least one compound whose ability to 
inhibit signal transduction activity 
of DS-CAM protein (s) is sought to be 
35 determined, and 
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a fixed concentration of at 
least one agonist for DS-CAM 
protein (s) , or functional modified 
forms thereof; and thereafter 



5 



(b) monitoring in said cells the level of 



10 



expression of said reporter protein as a 
function of the concentration of said 
compound, thereby indicating the ability 
of said compound to inhibit signal 
transduction activity. 



15 




signal transduction activity of 
DS-CAM protein (s) is sought to be determined, and 
_ ai^increasing__concentra 



invention, it is contemplated that invention DS-CAM 
proteins mediate signal transduction through the 

25 modulation of adenylate cyclase. For example, when a 
DS-CAM ligand binds to DS-CAM, adenylate cyclase causes 
an elevation in the level of intracellular cAMP. 
Accordingly, in one embodiment of the present invention, 
the bioassay for evaluating whether test compounds are 

30 capable of acting as agonists or antagonists comprises: 



20 



at least one agonist for DS-CAM 
protein (s) , or functional modified 
forms thereof. 



In yet another embodiment of the present 



(a) culturing cells containing: 

DNA which expresses DS-CAM 
protein (s) or functional modified 
forms thereof, 
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wherein said culturing is carried out in 
the presence of at least one compound 
whose ability to modulate signal 
transduction activity of DS-CAM protein is 
5 sought to be determined, and thereafter 

(b) monitoring said cells for either an 
increase or decrease in the level of 
intracellular cAMP. 

Methods well-known in the art that measure 
10 intracellular levels of cAMP, or measure cyclase 

activity, can be employed in binding assays described 
herein to identify agonists and antagonists of the 
DS-CAM. For example, because activation of some CAMS 
results in decreases or increases in cAMP, assays that 
15 measure intracellular cAMP levels can be used to evaluate 
recombinant DS-CAMs expressed in mammalian host cells. 

As- used- herein.,- "ability to modulate .signal 

transduction activity of DS-CAM protein" refers to a 
compound that has the ability to either induce (agonist) 
20 or inhibit (antagonist) signal transduction activity of 
the DS-CAM protein. 

Each of the invention bioassays (e.g., those 
described herein, and the like), can be conducted as 
competitive assays by co-expressing one or more members 

25 of the CAM immunoglobulin superfamily of proteins known 
in the art, such as N-CAMs, along with invention DS-CAMs. 
In addition, one or more members of the CAM 
immunoglobulin superfamily of proteins known in the art 
can be co- expressed with invention DS-CAMs to evaluate 

30 the agonistic or antagonistic effect on signal 

transduction of the non-DS-CAM members acting in concert 
with invention DS-CAMS. 
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In yet another embodiment of the present 
invention, the activation of DS-CAM polypeptides can be 
modulated by contacting the polypeptides with an 
effective amount 'of at least one compound identified by 
5 the above-described bioassays. - 

Members of the N-CAM superfamily of 
immunoglobulins have previously been implicated in 
disease. For example, various alterations of N-CAM 
levels have been seen in degenerative disease, 

10 developmental defects, and toxic conditions. Increases 
in the levels of N-CAM in the cerebrospinal fluid of 
patients with multiple sclerosis have been observed to 
parallel their clinical improvement (Massaro et al . , 
Tt-.al . J. Ne nrnl . Sci . SuppI . 6:85-88, 1987). Levels of 

15 N-CAM were reported to be elevated in the amniotic fluid 
of mothers carrying fetuses with neural tube defects 
(Ibsen et al., J. Neurochem . 41:363-366, 1983). Since 
"many "such defects are likely to be due to mechanical 
aberrations rather than genetic defects, confirmation of 

20 these results would provide a new diagnostic component 
for prenatal testing. Another provocative finding 
relates to observations on the stimulation of Golgi 
sialyltransf erases by lead (Breen and Regan, Development 
104:147-154, 1988; and Cookman et al . , J, Neurochem. 

25 49:399-403, 1987). Exposure to lead chloride markedly 

stimulated sialyltransf erase activity from postnatal days 
16 to 30 in rate. This time is coincident with the 
period when N-CAM normally becomes less sialylated. Thus 
exposure to lead at critical developmental periods would 

30 presumably lead to more highly sialylated, less adhesive, 
forms of N-CAM: this prevention of E-A conversion could 
have significant effects on neural development. E-A 
conversion itself has been found to be delayed in the 
mouse mutant staggerer (Edelman and Chuong, PXQQt Natl., 
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Acad. Sci . USA, 79:7036-7042, 1982) in conjunction with 
the connectivity changes associated with the mutation. 

The location and expression of DS-CAM in the 
Down Syndrome (DS) phenotype is supported by the studies 
5 of patients with partial trisomy 21. A subset of the DS 
features, including the typical facial appearance and 
mental retardation, were suggested by duplication of band 
21q22 only (Niebuhr, Humangenetik 21:99-101, 1974). 
Other studies mapped those features and congenital heart 

10 disease to the region 21q22 . 2 -q22 . 3 and between D21S267 
and MX1/MX2 (Korenberg et al., Am. J. Hym, Qznet ■ 
50:294-302, 1992 and Korenberg et al . , Prgg t EatI . Acfld. 
Sci . USA 91:4997-5001, 1994), a region of about 4 Mb that 
contains DS-CAM. The Ts65Dn mouse model of DS contains 

15 the region of MMU16 (Pgkl-psl to MXl/2) that includes 

DS-CAM and reveals some of the neurobehaviourial features 

of DS (Reeves et al . , Nature Genet. 11:177-183, 1995 and 

Holtzman et al., Proc. Natl . Acad. Sci . USA 93:13333- 
13338, 1996) . 



20 Close to 6% of DS individuals have 

Hirschsprung's disease (HSCR) (Garver et al . , Clin, 
Genet . 28:503-5-8, 1985) and more than 10% of all HSCR is 
associated with DS (Passarge, fflew Eng. J. Med. 276:138- 
143, 1967) . A modifier region of HSCR on chromosome 

25 21q22 (D21S259 - D21S156) has been reported in non-DS 

HSCR (Puf fenberger et al . , Hum. Mol , Genet. 3:1217-1225, 
1994) . The DS-CAM gene maps within this small region. 
The expression of DS-CAM in the neural crest derived 
enteric plexus of the gut was detected by mouse tissue in 

30 situ hybridization (Example 7) . The function of the DS- 
CAM protein as a neural cell adhesion molecule and the 
association of this region of chromosome 21 with HSCR, 
indicate that DS-CAM can play a role in the migration of 
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the cranial neural crest that populate this region. 
Thus, DS-CAM overexpression is responsible for the 
chromosome 21 association in non-DS HSCR and for the HSCR 
seen in DS. 

5 Mutations in the molecule CAM-LI, a molecule 

more similar to DS-CAM than to N-CAM (Figure 4), have 
established roles in human disease. The result in X- 
linked hydrocephalus (Rosenthal et al . , Nature Genet . 
2:107-112, 1992), type 1 X-linked spastic paraplegia and 

10 the MASA syndrome (including mental retardation, aphasia, 
shuffling gait, adducted thumb and agenesis of the corpus 
callosum) (Jouet et al., Nature Genet. 7:402-407, 1994). 
The perturbation of development by the aneuploid 
expression of CAM-LI supports a role for the aneuploid 

15 expression of DS-CAM in the causation of developmental 
and neurological abnormalities. 



In accordance with another embodiment of the 
present invention, there are provided methods for 
diagnosing DS-CAM associated disease, such as mental 
20 retardation, holoprosencephaly, agenesis of the corpus 
callosum, or schizencephaly, said method comprising: 

detecting, in said subject, a genomic or 
transcribed mRNA sequence including SEQ ID N0:1 
or SEQ ID NO: 10, or fragments thereof. 

25 Preferably, the DS-CAM nucleic acids detected in 

accordance with the invention diagnostic methods are 
either mutated in one form or another (such as point 
mutations, deletions, and the like), or are overexpressed 
relative to levels of DS-CAM expression in healthy 

30 non-diseased individuals. 

In accordance with another embodiment of the 
present invention, there are provided diagnostic systems, 
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preferably in kit form, comprising at least one invention 
nucleic acid in a suitable packaging material. The 
diagnostic nucleic acids are derived from the 
DS- CAM- encoding nucleic acids described herein. In one 
5 embodiment, for example, the diagnostic nucleic acids are 
derived from SEQ ID N0:1, SEQ ID NO: 7, SEQ ID NO: 8, 
SEQ ID NO: 9 or SEQ ID NO: 10. Invention diagnostic 
systems are useful for assaying for the presence or 
absence of nucleic acid encoding DS-CAM in either genomic 
10 DNA or in transcribed nucleic acid (such as mRNA or cDNA) 
encoding DS-CAM. 

A suitable diagnostic system includes at least 
one invention nucleic acid, preferably two or more 
invention nucleic acids, as a separately packaged 

15 chemical reagent (s) in an amount sufficient for at least 
one assay. Instructions for use of the packaged reagent 
are also typically included. Those of skill in the art 

can readily incorporate invention- nucleic probes- and/ or 

primers into kit form in combination with appropriate 

20 buffers and solutions for the practice of the invention' 
methods as described herein. 

As employed herein, the phrase "packaging 
material" refers to one or more physical structures used 
to house the contents of the kit, such as invention 

25 nucleic acid probes or primers, and the like. The 

packaging material is constructed by well known methods, 
preferably to provide a sterile, contaminant -free 
environment. The packaging material has a label which 
indicates that the invention nucleic acids can be used 

3 0 for detecting a particular sequence encoding DS-CAM 
including the nucleotide sequence set forth in 
SEQ ID N0:1, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO: 9 or 
SEQ ID NO: 10, thereby diagnosing the presence of, or a 
predisposition for, holoprosencephaly , agenesis of the 

35 corpus callosum, or for several phenotypes of Down 
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Syndrome including mental retardation, and the like. In 
addition, the packaging material contains instructions 
indicating how the materials within the kit are employed 
both to detect a particular sequence and diagnose the 
5 presence of, or a predisposition for, holoprosencephaly, 
agenesis of the corpus callosum, or for several 
phenotypes of Down syndrome including mental retardation, 
and the like. 

The packaging materials employed herein in 
10 relation to diagnostic systems are those customarily 
utilized in nucleic acid-based diagnostic systems. As 
used herein, the term ■■package" refers to a solid matrix 
or material such as glass, plastic, paper, foil, and the 
like, capable of holding within fixed limits an isolated 
15 nucleic acid, oligonucleotide, or primer of the present 
invention. Thus, for example, a package can be a glass 
vial used to contain milligram quantities of a 
- - contemplated nucleic- acid, oligonucleotide or primer, or _ 
it can be a microtiter plate well to which microgram 
20 quantities of a contemplated nucleic acid probe have been 
operatively affixed. 

"Instructions for use" typically include a 
tangible expression describing the reagent concentration 
or at least one assay method parameter, such as the 
25 relative amounts of reagent and sample to be admixed, 
maintenance time periods for reagent /sample admixtures, 
temperature, buffer conditions, and the like. 

All U.S. patents and all publications mentioned 
herein are incorporated in their entirety by reference 
30 thereto. The invention will now be described in greater 
detail by reference to the following non- limiting 
examples . 
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Materials and Methods 

Unless otherwise stated, the present invention 
was performed using standard procedures, as described, 
for example in Maniatis et al . , Molecular gapping; 
5 A Laboratory Manual , Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, New York, USA, 1982; Sambrook et al., 
supra , 1989; Davis et al . , Basic Methods in Molecular 
Biology , Elsevier Science Publishing, Inc., New York, 
USA, 1986; or Methods in Enzymology: Guide to Molecular 
10 Cloning Techniques Vol. 152, S. L. Berger and A. R. 

Kimmerl Eds., Academic Press Inc., San Diego, USA, 1987. 

Libraries . 

Construction of Bacterial Artificial Chromosome 
15 (BAC) library, BAC library construction of total human 
genomic DNA was performed as described in Shizuya et al . , 
Proc. Natl. Acad. Sci . USA 89:8794-8797, 1992; and Hubert 
et al., Genomics 41:218-226, 1997. Yeast artificial 
chromosome (YAC) clones were obtained from the CEPH 
20 mega-YAC library and grown under standard conditions 
(Cohen et al . , Nature 366:689-701 1993). 

PI artificial chromosome (PAC) library 
construction. A 3X human PAC library, designated RPCI-1 
(Ioannou et al . , Hum. Genet . 219-220, 1994) was 
25 constructed as described (Ioannou et al . , Nat . Genet . 
6:84-89, 1994). The library was arrayed in 384 well 
dishes. Subsequently, STSs generated by sequencing of 
clones using vector primers were used as hybridization 
probes to gridded colony filters of the PAC library. 

30 YAC DNA preparation. YAC clones were grown in 

selective media, pelleted and resuspended in 3 ml 0.9 M 
sorbitol, 0.1M EDTA pH 7.5, then incubated with 100 U of 
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lytocase (Sigma, St. Louis, MO) at 37°C for 1 hour. After 
centrifugation for 5 minutes at 5,000 rpm pellets were 
resuspended in 3 ml 50 mM Tris pH 7.45, 20 mM EDTA 0.3ml 
10% SDS was added and the mixture was incubated at 65°C 
5 for 30 minutes. One ml of 5 M potassium acetate was 
added and tubes were left on ice for 1 hour, then 
centrifuged at 10,000 rpm for 10 minutes. Supernatant 
was precipitated in 2 volumes of ethanol and pelleted at 
6,000 rpm for 15 minutes. Pellets were resuspended in 
10 TE, treated with RNase and reextracted with 
phenol -chloroform. 

Analysis by fluorescence in situ hybridization 
(FISH) . PAC or BAC clones were biotinylated by 

15 nicktranslation in the presence of biotin-14-dATP using 
the BioNick Labeling Kit (Gibco-BRL) . FISH was performed 
essentially as described (Korenberg et al . , Cytogenet . 
Cell Genet . 69:196-200, 1995). Briefly, 400 ng of probe 
DNA was mixed with 8 ng of human Cot 1 DNA (Gibco-BRL) 

2 0 and 2 ug of sonicated salmon sperm DNA in order to 

suppress possible background produced from repetitive 
human sequences as well as yeast sequences in the probe. 
The probes were denatured at 75°C, preannealed at 37°C for 
one hour, and applied to denatured chromosome slides 

25 prepared from normal male lymphocytes (Korenberg et al., 
supra , 1995) . Post-hybridization washes were performed 
at 4 0°C in 2X SSC/50% formamide followed by washes in IX 
SSC at 50°C. Hybridized DNAs were detected with 
avidin-conjugated fluorescent isothiocyanate (Vector 

30 Laboratories). One amplification was performed by using 
biotinylated anti-avidin. For distinguishing chromosome 
subbands precisely, a reverse banding technique was used, 
which was achieved by chromomycin A3 and distamycin A 
double staining (Korenberg et al . , suera, 1995). The 

35 color images were captured by using a Photometries 
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Cooled- CCD camera and BDS image analysis software (Oncor 
Imaging, Inc. ) . 

Southern blot analysis. Gel electrophoresis of 
DNA was carried out on 0.8% agarose gels in IX TBE. 
5 Transfer of nucleic acids to Nybond N+ nylon membrane 
(Amersham) was performed according to the manufacturer's 
instruction. Probes were labeled using RadPrime Labeling 
System (BRL) . Hybridization was carried out at 42 °C for 
16 hours in 50% formamide, 5X SSPE, 5X Denhardt ' s 0.1% 
10 SDS, 100 mg/ml denatured salmon sperm DNA. The filters 
were washed once in lx SSC, 0.1% SDS at room temperature 
for 20 minutes, and twice in 0 . IX SSC, 0.1% SDS for 20 
minutes at 65°C. The blots were exposed onto X-ray film 
(Kodak, X-OMAT-AR) . 

!5 Sequencing of PAC and BAC endclones. PAC 

clones were inoculated into 500 ml of LB/kanamycin and 
grown overnight. BAC clones were inoculated into 500 ml 
of LB/chloramphenicol and grown overnight. DNAs were 
isolated using QIAGEN columns according to the vendors 

20 protocol with one additional 

phenol/chloroform/isoamylalcohol extraction followed by 
one additional chlorof orm/isoamylalcohol extraction. 
Clones were sequenced using the Gibco-BRL cycle 
sequencing kit with standard T7 and SP6 primers. 

25 EXAMPLE 1 

Constru ction of BAC Contiq 

To provide stable clones for gene isolation and 
sequencing initiatives in the D21S55 to MX1 region, 
contigs were constructed using Bacterial Artificial 
30 Chromosomes (BACs) and PI Artificial Chromosomes (PACs) . 
BAC library construction of total human genomic DNA was 
performed as described (Shiyuza et al., supra, 1992; Kim 
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et al., genomics 34:213-218, 1996) . A BAC library was 
screened using several YACs spanning the region; a PAC 
library (Iannou et al., Nature Genet. 6:84-89, 1994) was 
screened using radiolabeled STS PCR products and whole 
5 BACs in gap filling initiatives. 

The location of these BAC and PAC clones was 
confirmed by fluorescence in situ hybridization (FISH) . 
Clone to clone Southerns using 24 new STSs (generated 
from direct sequencing of BAC and PAC ends) along with 35 

10 pre-existing STSs were used to show overlaps between BACs 
and PACs. The STS density over the intervals covered in 
BACs and PACs was 1 STS every 60 kb, and 79% of the 
clones were positive for 2 or more STSs. Approximately 
3.5Mb of the 4 -5Mb D21S55 to MX1 interval is covered in 

15 85 BACs and 25 PACs representing 4- fold coverage within 
the contigs (Hubert et al . , genomics 41:218-226, 1997). 

-The -mi-n-i-ma-1 -eon-tig sizes -as-de-termined-by .counting. only_„. 

non-overlapping clones are: 1100 kb, 900 kb, 510 kb, 380 
kb and 270 kb. Insert size of BAC clones was measured by 

20 running pulse-field gel electrophoresis after digesting 
DNA with Not I. 

EXAMPLE 2 

Direct cDNA Selection 

A modified direct cDNA selection technique 
25 (Yamakawa et al . , Hum. Mol , Genet. 4:709-716, 1995; 

Yamakawa et al., Cytoaenet. Cell Genet. 74:140-145, 1996) 
was applied to BAC-423A5, BAC-430F1, BAC-628H2, BAC-371H8 
and PAC-31P10 (Figure 1) by using cDNA from trisomy 21 
human fetal brain, and the selected fragments were then 
30 subcloned into a plasmid vector. 
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Total RNA was isolated from 14 week trisomy 21 
fetal brain using TRI region™ (Molecular Research Center, 
Inc.). Poly (A) + RNA was isolated using Poly (A) Quick 
mRNA isolation kit (STRATAGENE) . Double stranded cDNA 
5 was synthesized using Superscript™ Choice System (GIBCO 
BRL) from 5 jig trisomy 21 fetal brain poly (A)* RNA using 
1 fig oligo (dT) 15 or 0.1 fig random hexamer. The entire 
synthesis reaction was purified by Gene Clean*II kit 
(BIO101, Inc.) and then kinased. Sau3AI linker was 
10 attached to the cDNA which was subsequently digested with 
Sau3AI. The reaction was purified using Gene Clean. 
Mbol linker was attached to the cDNA and the reaction 
purified by Gene Clean (Morgan et al . , pypra., 1992). The 
synthesized product was amplified by PCR using one strand 
15 of Mbol linker ( 5 1 CCTGATGCTCGAGTGAATTC3 ' ) (SEQ ID NO:4) 
as a primer. PCR cycling conditions were 4 0 cycles of 
94°C/15 seconds, 60°C/23 seconds, 72°C/2 minutes in a 100 
jil of lx PCR buffer (Promega) , 3 mM MgCl 2 , 5.0 units of 
Taq polymerase - (Promega) , 2 -/iM primer and 0.2 mM • dNTPs 

20 Nineteen BAC DNAs {total 2.5 fig) and 2 PAC DNAs 

between the region ETS2 and MX1 were prepared using 
QIAGEN plasmid kit and were biotinylated using Nick 
Translation Kit and biotin-16-dUTP (Boehringer 
Manneheim) . 3 ^g of heat denatured PCR amplified cDNA 

25 was annealed with 3 fig of heat denatured C0T1 DNA (BRL) 
in 100/il hybridization buffer (750 mM NaCl, 50 mM 
NaP0 4 (pH7.2) , 5 mM EDTA, 5X Denhardt's, 0.05% SDS and 50% 
formamide) at 42°C for two hours. After 
prehybridization, 1.2 /zg of heat denatured biotinylated 

30 BAC DNA was added and incubated at 42°C for 16 hours, 
c DNA -BAC DNA hybrids were precipitated with EtOH and 
dissolved in 60 /xl of 10 mM Tris-HCl (pH 8.0), 1 mM EDTA. 
After addition of 40 /il 5 M NaCl, the DNA was incubated 
with magnetic beads (Dynabeads M-280, Dynal) at 25°C for 

35 1 hour with gentle rotating to allow attachment of the 
DNA to the magnetic beads. The beads were then washed 
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twice by pipetting in 400 /xl of 2X SSC, setting in magnet 
holder (MPC-E™, Dynal) for 30 seconds and removing the 
supernatant. Four additional washes were performed in 
0.2X SSC at 68°C for 10 minutes each with transfer of the 
5 beads to new tubes at each wash. cDNAs were eluted in 
100/zl of distilled water for 10 minutes at 80°C with 
occasional mixing. The eluted cDNAs were amplified by 
PCR as described above. After twice repeating the 
selection procedure using magnetic beads, amplified cDNAs 
10 were digested with EcoRI and subcloned into pBlueScript 
KS + (STRATAGENE) . Insert DNAs were isolated from the 
subclones, and were analyzed by Southern hybridization 
and DNA sequencing. 

The direct cDNA selection procedure using 19 
15 BACs and 2 PACs between ETS2 and MX1 generated a total of 
145 unique cDNA fragments. Genbank and TIGR homology 
searches using FAST A revealed matches to ETS2, HMG14, 
PEP 19 , a Na- K ATPase, Ti tan ESTs , MX1- region .. ESTs, - and.. 14 _ 
ESTs of unknown function. A cDNA library from a trisomy 
20 21 fetal brain at 14 weeks gestation was screened using 
one of these unique cDNA fragments labeled "E51 1 ' 
(SEQ ID NO: 3) . 

EXAMPLE 3 

Isolation of human PS-CAM cDNA usin g CPNA hibxaXY 

25 Screening 

A trisomy 21 human fetal brain (14 weeks of 
age) cDNA library was constructed using ZAP-cDNA* 
synthesis kit (STRATAGENE) which generates a 
unidirectional cDNA library. Briefly, double- stranded 
30 cDNA was synthesized from 5 /zg trisomy 21 fetal brain 
poly (A) + RNA using a hybrid oligo (dT) -Xhol linker primer 
with 5 -methyl dCTP. An EcoRI linker was attached to the 
cDNA which was subsequently digested with EcoRI and Xhol, 
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and then cloned into UNI -ZAP XR vector (STRATAGENE) . The 
library was packaged using Gigapack* II Gold packaging 
extract. The titer of the original library was 1.1 x 10 6 
p.f.u. /package. The library was amplified once. A 
' 5 blue-white color assay indicated that 99% of the clones 
had inserts. 

Screening of the trisomy 21 fetal brain cDNA 
library was performed using one of the 14 5 unique cDNA 
fragments labeled "E51" (SEQ ID NO: 3) prepared as 

10 described above. Phages were plated to an average 

density of 1 x 10 5 per 175 cm 2 plate. Plaque lifts of 20 
plates (2 x 10 6 phages) were made using duplicated nylon 
membranes (Hybond-N+; Amersham) . Hybridized membranes 
were was hed to final stringency of 0 . 2X SSC, 0 . IX SDS at 

15 65°C. The filters were exposed overnight onto X-ray 
film. 

Identification" of 62 clones were made out of 2 
x 10 6 clones in the original library. . Eighteen of these 
positive phage clones were converted to plasmids, and 

20 their DNAs were isolated. These cDNAs were independently 
numbered as separate DS-CAM (fiown Syndrome Cell adhesion 
Molecule) clones. The length of the inserts of these 
clones ranged from 2.4 kb to 6.6 kb. Exon trapping 
(Buckler et al . , Prnn . Natl. Acad. Sci. USA 88:4005-4009, 

25 1991; Church et al . , Nature Genet . 6:98-105, 1994) was 
also used to isolate cDNAs in the BAC and PAC contig. 
With this approach, three exons identified from BAC-539E7 
and one from BAC-430F1 were found to identify the same 
sequences as those isolated by cDNA selection. 

30 Sequence analysis of one of the clones, labeled 

DS-CAM-42, revealed a 6110 bp DNA sequence which 
contained a large ORF (5687 bp) as well as 3 1 -UTR 
sequence (423 bp), but the 5 1 UTR and start codon were not 
located in clone DS-CAM-42. To characterize the 5» end, 
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two further clones, DS-CAM-18 of 6.5 kb and DS-CAM-52 of 
6.6 kb were characterized. Sequence analyses of these 
clones close to the 5 1 end overlap with sequence at the 
5' end of DS-CAM-42. However, DS-CAM-18 extends 416 bp 
5 farther 5\ and DS-CAM-52 extends 494 bp farther 5' than 
DS-CAM-42. The extra 494 bp sequence extends the ORF by 
43 bp at the 5' end and contains a start codon. Two stop 
codons occur 330 bp and 427 bp upstream of the start 
codon. The 494 bp of additional 5' sequence found in 

10 DS-CAM-52 combined with DS-CAM-42 (6604 bp) yield a 

consensus cDNA that encodes one isoform of the invention 
protein labeled DS-CAM1. The DS-CAM1 cDNA contains an 
open reading frame of 5730 bp (SEQ ID NO:l) coding for a 
1910 amino acid protein (SEQ ID NO: 2; approximately 211 

15 kilodaltons) , flanked by 452 bp of 5 1 -UTR and 422 bp of 
3 '-UTR. The 5 » -UTR is highly GC rich (81% GC over 452 
bp) and contains 13 Mspl sites, as well as 72 CG and 93 
GC dinucleotide pairs. 



The DS-CAM1 protein contains an extracellular 
20 component at the N-terminus consisting of nine tandemly 
repeated Ig-like C2 type domains and a tenth Ig-like C2 
domain located between domains four and five of an array 
of six repeated fibronectin type III domains (Figure 2) . 
Each Ig-like C2 domain consists of approximately 100 
25 amino acids with a pair of conserved cysteines separated 
by 49-56 residues. A single transmembrane domain of 22 
amino acids was defined by using the TMBASE program 
(Hoffmann and Stoffel, Bial_» Chew, Hoppe-Seyley 374:166, 
1993) . The remaining 294 amino acids at the C- terminus 
30 corresponding to the cytoplasmic domain have partial 
homologies to the mouse M-phase inducer phosphatase 2 
(Kakizuka et al. f Genes Dev . 6:578-590, 1992) in two 
regions, one with 34% identity and 52% similarity over 46 
bp and a second with 38% identity and 52% similarity over 
35 21 bp. The homolog of Drosophila glass gene (O'Neill et 
al., Prnr. Natl. Acad. Sci . USA 92:6557-6561, 1995) with 
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30% identity and 52% similarity over 42 bp, and the mouse 
delta opioid receptor (Evans et al . , Science 258:1952- 
1955, 1992) with 43% identity and 60% similarity over 30 
bp. The putative protein contains 16 potential 
5 N-glycosylation sites. 

A homology search of the predicted amino acid 
sequence of the 5730 bp open reading frame of DS-CAM1 
(SEQ ID NO:l) to genes registered in the Genbank and the 
EMBL databases was conducted by using the BLAST- P program 

10 {Altschul et al., J. Mol . Biol . 215:403-410, 1990). The 
predicted amino acid sequence revealed homologies to 
multiple proteins (Figure 4) including CAM-LI (Moos et 
al., Nature 334:701-703, 1988), BIG-1 (brain-derived 
immunoglobulin (Ig) superfamily molecule-1) (Yoshihara et 

15 al., Neuron 13:415-426, 1994), DCC (deleted in colon 
cancer) (Fearon et al . , Science 247:49-56, 1990), and 

_revealed„DS-C^ cJLass of the 

immunoglobulin (Ig) superfamily. Homology searches with 
sequences of Ig type-C2 domains and fibronectin type-Ill 
20 domains of the most highly related Ig-superf amily members 
(CAM-LI, DCC, and axonin-1) were conducted by using the 
FASTA program (Pearson and Lipman, Proc . Natl, Acad, S<?i. 
USA 85:2444-2448, 1988). 

In addition, a splice variant cDNA sequence 
25 encoding a non-membrane bound isoform of DS-CAM1, 

referred to herein as DS-CAM2, is provided herein. Two 
human DS-CAM cDNA clones (DS-CAM-18 and DS-CAM-52) were 
found to contain identical deletions of 191 bp that occur 
in neighboring exons and that delete bp 5133 to 5323 of 
30 the SEQ ID NO:l cDNA sequence encoding DS-CAM1 (Figure 

3) . The resulting splice variant transcript encoding DS- 
CAM2 (SEQ ID NO: 10) is deleted for the entire 
transmembrane domain that is encoded by the more 3' of 
these exons. Further, the deletion changes the reading 
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frame and creates a stop codon 36 bp downstream of the 
deletion resulting in a soluble extracellular protein of 
1571 amino acids (SEQ ID N0:11). The distal border of 
the resulting deletion contains the canonical AG of the 
5 RNA splicing consensus acceptor site. The proximal 
border contains a variant of the donor splice site 
consensus sequence (Jackson, Nucl . Acids Res . 19:3795- 

3798, 1991) . 



To confirm that the DS-CAM cDNA originated from 
10 the BACs and PACs in the Down syndrome region and to 

determine the genomic size of DS-CAM, the longest DS-CAM 
cDNA clones (DS-CAM-42; 6.1 kb, DS-CAM-18; 6.5 kb, 
DS-CAM-52; 6.6 kb) were hybridized to Southern blots 
containing the BAC and PAC clone contig (Figure 1) . 
15 DS-CAM-42, 18 and 52 hybridized to BACs 423A5, 430F1, 
628H2, 539E7, 371H8, 825E1, 593D1, 261F12, 30E4, 385B7, 
388F4, and to PACs 31P10, 58D10. BACs 816F6, 116E8, 

720G4, 619H8 were only -positive for DS-CAM-18 and 

DS-CAM-52 but negative for DS-CAM-42. All other BACs 
20 shown in Figure 1 were negative. These results indicate 
that the DS-CAM gene spans 900 kb-1200 kb genomic DNA and 
covers a gap in this BAC and PAC contig indicated by an 
arrowhead as well as in the available YAC contigs 
(Korenberg et al., Genome Res . 5:427-443, 1995; Gardiner 

25 et al., Somat. Cell Mol . Genet . 21:399-414, 1995). 

DS-CAM cDNA sequences were confirmed to originate from 
these BACs and PACs by direct sequencing of the BACs and 
PACs as templates using cDNA sequence-specific primers. 



The map position of DS-CAM on chromosome 
30 21q22.2-22.3 was confirmed by using clone DS-CAM-42 as a 
probe for fluorescence in-situ hybridization. Two 
independent experiments were performed and over 100 
metaphase cells were evaluated. Signals were clearly 
seen on two chromatids of at least one chromosome in 85% 
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of cells. There were no other double signal sites seen 
in greater than 1% of cells. 



EXAMPLE 4 



Northern Blo t Analysis Of Human PS-CAM Expression 

5 Inserts containing DS-CAM cDNA were excised 

from the base vector by digestion with Xhol and EcoRI . 
After labeling using the random priming method (RadPrime 
Labeling System; GIBCO BRL) , followed by purification 
using G-50 Sephadex columns (Quick Spin Column; 

10 Boehringer Mannheim) , the fragments were used a probes 

for Northern hybridization using Multiple Tissue Northern 
Blot (Clontech) . A Northern blot assay was conducted 
using DS-CAM cDNA as a probe in various fetal and adult 
tissues including heart, brain, placenta, lung, liver, 

15 skeletal muscle, kidney, and pancreas. Northern 

hybridization was performed by foil owing- the - 

manufacturer's instructions. The hybridized membrane was 
washed at a final stringency of 0 . IX SSC and 0.1X SDS at 
50°C. The filter was exposed to X-ray film (Kodak X-OMAT 

20 AR) at -70°C for 1-5 days. 

The results of Northern analysis using human 
fetal tissues showed that 8 . 5 kb and 7.6 kb transcripts 
are expressed only in fetal brain and not expressed in 
fetal lung, fetal liver and fetal kidney. In adult 

25 tissues, three transcripts of 9.7 kb, 8.5 kb, and 7.6 kb 
are present in the brain. Placenta shows faint bands, 
and the sizes are similar to those in brain. In skeletal 
muscle, a faint smaller band (6.5 kb) is detected. In 
multiple parts of the adult human brain, transcripts of 

30 9.7 kb, 8.5 kb and 7.6 kb are differentially expressed. 
The 9.7 kb transcript is highly expressed in the 
substantia nigra, moderately expressed in amygdala and 
hippocampus, and less expressed in the whole brain. A 
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similar pattern is obtained using a PCR product which 
spans the 191 bp deletion found in clones DS-CAM-18 and 
DS-CAM-52 encoding the splice variant sequence 
corresponding to DS-CAM2. Thus, splice variant cDNA 
5 transcripts encoding a DS-CAM family of proteins are 
clearly contemplated by the present invention. 

EXAMPLE 5 

pt-PPP Assays O f Human DS-CAM Expression 

Reverse-transcriptase polymerase chain reaction 
10 (RT-PCR) assays verses cDNA libraries of various human 
tissues were conducted using primers numbered B9-131F 
(SEQ ID N0:5) and B9-131R (SEQ ID NO:6). The results 
demonstrated expression of human DS-CAM mRNA in fetal and 
adult brain, and fetal kidney. In addition, a breast 
15 carcinoma cell line showed expression of human DS-CAM 
... mRNA 

The cDNAs from 13 independent human fetal and 
adult sources were analyzed by PCR using primer pairs 

20 that flanked the alternatively spliced region that 

results in a 191 base pair deletion of nucleotides 5133- 
5323 of the DS-CAM1 cDNA set forth in SEQ ID NO:l. The 
primers were designed to generate products of different 
sizes for each of the two alternatively spliced 

25 transcripts: 536 bp corresponding to the non-deleted 
DS-CAM-1 transcript and 345 bp corresponding to the 
deleted DS-CAM2 transcripts. The analyses included adult 
samples from amygdala (24 years), skeletal muscle (36 
years) and three independent lymphoblastoid cell lines. 

30 Fetal samples included whole brain of a trisomy 21 fetus 
(14 weeks), four from whole brain (4.5-13 weeks), one 
from temporal lobe (28 weeks) and two from heart (4.5 and 
13 weeks) . The results indicate that all fetal and adult 
samples produced two bands corresponding to PCR products 
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of the predicted sizes which indicates the expression of 
two alternatively spliced transcripts. 

EXAMPLE 6 

Isolation of mouse PS-CAM cDNA q J-PPes 

5 A mouse brain cDNA library was prepared from 19 

week old female C57 Black/6 mice in the Uni-ZAP XR Vector 
(STRATAGENE) . The cDNAs were oligo-dT primed and cloned 
unidirectionally into the EcoRI and Xhol sites of the 
vector. The average insert size is 1.0 kb. The library 

10 was screened using a human DS-CAM cDNA clone as a probe. 
Two partial mouse DS-CAM cDNA clones were isolated and 
sequenced. The combined nucleotide sequences of these 
clones are set forth in SEQ ID NO: 7, SEQ ID NO: 8 and 
SEQ ID NO: 9, and were found to represent the 5', middle 

15 and 3' portions, respectively, of cDNA encoding a mouse 
DS-CAM: 

EXAMPLE 7 

H ybridization analysis of DS-CAM cDNA in mou se tjssuqs 

BALB/c and C57BL/6 x DBA/2 embryos, fetuses and 
20 postnatal brains were fixed and embedded as described in 
detail in Lyons et al., ( J. Neurosci . 15:5727-5738, 
1995) . Embryos were fixed in 4% paraformaldehyde in 
phosphate buffered saline (PBS) overnight, dehydrated and 
infiltrated with paraffin. Five to seven micron serial 
25 sections were mounted on gelatinized slides. Two 

sections were mounted/slide, deparaf f inized in xylene, 
rehydrated and post -fixed. The sections were digested 
with proteinase K, post-fixed, treated with 
tri-ethanolamine/acetic anhydride, washed and dehydrated. 
30 cRN A probes were prepared from DS-CAM-M-14. The plasmid 
was linearized with Xbal and T7 polymerase was used to 
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generate the antisense cRNA. The plasmid was linearized 
with Kpnl and T3 polymerase was used to generate the 
sense control cRNA. The cRNA transcripts were 
synthesized according to manufacturer's conditions 
5 (STRATAGENE) and labeled with 3S S-UTP (>1000 Ci/mmol; 

Amersham) . cRNA transcripts larger than 100 nucleotides 
were subjected to alkali hydrolysis to give a mean size 
of 70 bases for efficient hybridization. 

Sections were hybridized overnight at 52°C in 

10 50% deionized formamide, 0.3M NaCl, 20 mM Tris-HCl pH 
7.4, 5 mM EDTA, 10 mM NaP04 , 10% dextran sulfate, lx 
Denhardt's, 50 ptg/ml total yeast RNA, and 50-75,000 
cpm//xl 35 S-labeled cRNA probe. The tissue was subjected to 
stringent washing at 65°C in 50% formamide, 2X SSC, 10 mM 

15 DTT and washed in PBS before treatment with 20 /xg/ml 

RNase A at 37°C for 30 minutes. Following washes in 2X 
SSC and 0 . IX SSC for 10 minutes at 37°C, the slides were 
dehydrated .and dipped, in .Kodak NTB-2 nuclear track . _ _\ 
emulsion and exposed for 2-3 weeks in light-tight boxes 

20 with desiccant at 4°C. Photographic development was 
* carried out in Kodak D-19. Slides were counterstained 
lightly with toluidine blue and analyzed using both 
light- and darkfield optics of a Zeiss Axiophot 
microscope. Sense control cRNA probes (identical to the 

25 mRNAs) always gave background levels of hybridization 
signal. Embryonic structures were identified with the 
help of the following atlases: Rugh (The Mouse; Its 
Reproducti on and Development. Oxford Univ. Press, Oxford, 

UK, 1990), Kaufman ( The Atlas of Mouse Development. 
30 Acad. Press, New York, NY, 1992), and Altman and Bayer 
iSlipj^, 1995) . 



Tissue in situ hybridization analysis was 
performed using a mouse cDNA as a probe on sections of 
normal mouse embryos from days 8.5-17.5 post coitum (pc) 
35 as well as in newborn, two weeks and adult brains as 



WO 98/17795 PCT/US97/19547 

61 

described above. The results indicate that there is no 
detectable expression of DS-CAM at 8 . 5 days pc. At 9.5 
days pc, expression was detected in the neuroepithelium. 
Low levels of expression were detected within the 
5 branchial arches, suggestive of migrating neural crest 
cells. At 10.5 days pc, the trigeminal ganglia (neural 
crest derived) begin to express the transcript and 
expression within the branchial arches was more evident. 

Expression at 11.5 days pc was abundant 

10 throughout the brain. The transcript was found within 
the regions of the nervous system that differentiate 
earliest during development (Altman and Bayer, supra, 
1995) . In the brain, this includes the ventral-most 
regions, such as the thalamus and medulla. Some 

15 expression was detected within the olfactory epithelium. 
Expression within the neural tube begins in two areas: 
the ventrolateral (corresponding to the areas in which 

the- motor- neurons differentiate.)- and the .. lateral .. gray. 

columns (that later form commissural neurons) (Leber et 

20 al. # J Neurosci . 15:1236-1248, 1990). The dorsal root 
ganglia (neural crest derived) expressed the transcript 
at 11.5 days pc. The trigeminal ganglia show higher 
levels at 11.5 days pc than they did at 10.5 days. 
Migrating neural crest can be seen within the maxilla, 

25 the mandibular arch, and in the developing gut. Signal 
was observed within the mesenchyme surrounding the 
umbilical vein and artery. 

At 12.5 days pc, expression was more extensive 
than at 11.5 days pc. More of the nervous system 
30 exhibits expression of the transcript, including a larger 
portion of midbrain, the pontine areas, the basal ganglia 
and the outermost layer of cortex. Neurons in this layer 
have undergone mitosis in the subependymal layer of the 
cortex and migrated into the mantle layer of the cerebral 
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cortex as differentiated cells (Smart et al . , J . Comp 
Neurol . 116:325-347, 1961). 



At 13.5 days pc, expression was seen throughout 
most of the brain. The outermost layer of the gut also 
5 appears to be expressing at this stage; these cells are 
neural crest derived and form the myenteric ganglia. At 
15.5 and 16.5 days pc, most of the neural crest derived 
neural structures have some expression. For example, the 
regions of the snout that will develop into the sensory 
10 structures at the base of the vibrissae, the pancreatic 
ganglia, the heart ganglion, the enteric nervous system, 
and the sympathetic trunk all express the transcript. 

There is no expression within the umbilicus at 
this stage. Two non-neuronal structures express this 

15 gene, the gonad and the annulus fibrosus of the 

intervertebral disk. The olfactory bulb exhibits signal 
'""both Iff'the" granule eel Island" within "the "tufted mitral - 
cells. Within the newborn brain, the transcript was 
expressed most extensively within the differentiating 

20 regions such as the septal area, olfactory bulb, inferior 
colliculus and hippocampus. In the adult brain, the gene 
was expressed in many areas including amygdala, cortex, 
hippocampus and thalamus. In the adult cerebellum the 
transcripts were detected in the Purkinje cell layer and 

25 in the deep cerebellar nuclei. 



While the invention has been described in 
detail with reference to certain preferred embodiments 
thereof, it will be understood that modifications and 
variations are within the spirit and scope of that which 
30 is described and claimed. 
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Summary of Sequences 

SEQ ID N0:1 is the nucleic acid sequence (and the 
deduced amino acid sequence) of cDNA encoding a novel 
human DS-CAM1 protein of the present invention. 

5 SEQ ID NO: 2 is the deduced amino acid sequence of a 
human DS-CAM1 protein of the present invention. 

SEQ ID N0:3 is the cDNA probe (labeled "E51") used to 
isolate cDNA encoding human DS-CAM. 

SEQ ID NO: 4 is an Mbol linker sequence. 

10 SEQ ID NO:5 is a primer labeled B9-131F used in the 
RT-PCR assay described in Example 5. 

SEQ ID NO: 6 is a primer labeled B9-131R used in the 
- RT-PCR assay described in Example 5. - 

SEQ ID NO: 7 is the 5' region of a partial mouse-derived 
15 cDNA clone encoding an invention DS-CAM protein. 

SEQ ID NO: 8 is the middle region of a partial 
mouse-derived cDNA clone encoding an invention DS-CAM 
protein. 

SEQ ID NO: 9 is the 3' region of a partial mouse-derived 
20 cDNA clone encoding an invention DS-CAM protein. 

SEQ ID NO: 10 is the nucleic acid sequence (and the 
deduced amino acid sequence) of cDNA encoding a novel 
human DS-CAM2 protein of the present invention. 

SEQ ID NO: 11 is the deduced amino acid sequence of a 
25 human DS-CAM2 protein of the present invention, which is 
a splice variant of DS-CAM1 (SEQ ID NO: 2) . 
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SEQUENCE LISTING 



(]) GENERAL INFORMATION : 

(I) APPLICANT: Cedars-Sinai Medical Center 

(ii) TITLE OF INVENTION: NUCLEIC ACID ENCODING DS-CAM 
PROTEINS AND PRODUCTS RELATED THERETO 

(iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Campbell and Flores 

(B) STREET: 4370 La Jolla Village Drive, Suite 700 

(C) CITY:San Diego 

(D) STATE: CA 
<E) COUNTRY: USA 
(F) ZIP:92122 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC- DOS /MS- DOS 

(D) SOFTWARE : Patentln Release #1.0, Version HI. 25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

- (viT) — PR I OR"AP PLICATION DATA: 

(A) APPLICATION NUMBER: US 60/029,322 

(B) FILING DATE: 25-OCT-1996 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Ramos, Robert T. 

(B) REGISTRATION NUMBER: 37,915 

(C) REFERENCE /DOCKET NUMBER: P-CE 2817 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-535-9001 

(B) TELEFAX: 619-535-8949 



(2) INFORMATION FOR SEQ ID NO:l: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6604 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 453.. 6185 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TGACTGAGGC CGGAGCACGG CAAAGATGAG CCTGCCCGCC CGCCTGCTGC CTGGATGCGG 60 

AGGGTGAGGG CTGGCGCACG GGAGGCCGCT GGCTGCGCAT TCTGGGCGCC GAGTGCCCGG 120 

GATGAGCTCA CGCCCGCGTC TGCGGCTCTC TCCACCTGCC GACCTGCCGG GGGCCCACTG 180 

AGCTGACGGC GCACCTGGGC TCCGGCCGCA GCGTGGGGCG CGGCGCCCGG GAGCAGGTGT 24 0 

GCAGGAGCGC AGCGCGCGGC GAGCGCAGCC CTCGCTCCGG AGCCCGGCCG CGCCGCGTGC 300 

CCGGGCGGCT AGGCAGCGGC GGCGGCGGCG GCGGGCGGCG GGCGGGCGGC GGCCCCCGGG 360 

CAGGTGCCGA GCGGCGAGCG GAGCCGGGCC GGGCGGAGCG CGGGGGGCGA GGCCGGCGCG 420 

TCGCTCGCGG GAGGCCGGGG AGCGGCAGGG GC ATG TGG ATA CTG GCT CTC TCC 473 

Met Tro Tie Leu Ala Leu Ser 

1 ^ 5 

TTG TTC CAG AGC TTC GCG AAT GTT TTC AGT GAA GAC CTA CAC TCC AGC 521 

Leu Phe Gin Ser Phe Ala Asn Val Phe Ser Glu Asp Leu His Ser Ser 
10 15 20 

CTC TAC TTT GTC AAT GCA TCT CTG CAA GAG GTA GTG TTT GCC AGC ACC 569 
Leu Tvr Phe Val Asn Ala Ser Leu Gin Glu Val Val Phe Ala Ser Thr 
25 30 35 

ACG GGG ACT CTG GTG CCC TGC CCC GCA GCA GGC ATC CCT CCT GTG ACT 617 
Thr Gly Thr Leu Val Pro Cys Pro Ala Ala Gly lie Pro Pro Val Thr 
40 45 $0 55^ 

CTC AGA TGG TAC CTA GCC ACG GGC GAG GAG ATC TAC GAT GTC CCC GGG 665 
Leu Arq Trp Tyr Leu Ala Thr Gly Glu Glu lie Tyr Asp Val Pro Gly 
60 ~ 65 70 

ATC CGC CAC GTC CAC CCC AAC GGC ACT CTC CAA ATT TTC CCC TTC CCT 713 
He Arc His Val His Pro Asn Gly Thr Leu Gin He Phe Pro Phe Pro 
75 80 85 

CCT TCA AGC TTC AGT ACC TTA ATC CAT GAT AAT ACT TAT TAT TGC ACA 761 
Pro Ser Ser Phe Ser Thr Leu lie His Asp Asn Thr Tyr Tyr Cys Thr 
90 95 100 

GCT GAA AAT CCT TCA GGG AAA ATT AGA AGT CAG GAT* GTC CAC ATC AAG 809 
Ala Glu Asn Pro Ser Gly Lys He Arg Ser Gin Asp Val His He Lys 
105 HO H5 

GCT GTT TTA CGG GAG CCC TAT ACA GTC CGT GTG GAG GAC CAG AAA ACC 857 
Ala Val Leu Arg Glu Pro Tyr Thr Val Arg Val Glu Asp Gin Lys Thr 
120 " 125 130 135 

ATG AGA GGC AAT GTT GCG GTC TTC AAG TGC ATT ATC CCC TCC TCG GTG > 905 
Met Arg Gly Asn Val Ala Val Phe Lys Cys He He Pro Ser Ser Val 
140 145 150 

GAG GCG TAC ATC ACT GTC GTC TCA TGG GAG AAA GAC ACT GTT TCA CTT 953 
Glu Ala Tyr He Thr Val Val Ser Trp Glu Lys Asp Thr Val Ser Leu 
155 160 165 

GTC TCA GGA TCT AGA TTT CTC ATC ACA TCC ACG GGA GCC TTG TAT ATT 1001 
Val Ser Gly Ser Arg Phe Leu He Thr Ser Thr Gly Ala Leu Tyr He 
170 175 180 
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AAA GAT GTA CAG AAT GAA GAT GGA TTG TAT AAC TAC CGC TGC ATC ACG 
Lys Asp Val Gin Asn Glu Asp Gly Leu Tyr Asn Tyr Arg Cys lie Thr 
185 190 195 

CGG CAT CGA TAC ACC GGA GAG ACG AGG CAG AGC AAC AGC GCC AGA CTT 
Arq His Arq Tyr Thr Gly Glu Thr Arg Gin Ser Asn Ser Ala Arg Leu 
200 205 210 215 

TTT GTA TCA GAC CCA GCG AAC TCA GCC CCA TCC ATA CTG GAT GGG TTT 
Phe Val Ser Asp Pro Ala Asn Ser Ala Pro Ser lie Leu Asp Gly Phe 
220 225 230 

GAC CAT CGC AAA GCC ATG GCT GGG CAG CGT GTG GAG CTG CCT TGC AAA 
Asp His Arq Lys Ala Met Ala Gly Gin Arg Val Giu Leu Pro Cys Lys 
235 240 245 

GCG CTC GGG CAC CCT GAG CCA GAT TAC CGC TGG CTG AAG GAC AAC ATG 
Ala Leu Gly His Pro Glu Pro Asp Tyr Arg Trp Leu Lys Asp Asn Met 
250 255 260 

CCC CTG GAA CTT TCA GGG AGG TTC CAG AAG ACC GTG ACG GGG CTG CTC 
Pro Leu Glu Leu Ser Gly Arg Phe Gin Lys Thr Val Thr Gly Leu Leu 
265 270 275 

ATT GAG AAC ATT CGC CCC TCG GAC TCA GGC AGC TAT GTT TGT GAA GTG 
lie Glu Asn He Arg Pro Ser Asp Ser Gly Ser Tyr Val Cys Glu Val 
280 285 290 295 

TCC AAC AGA TAC GGA ACT GCT AAG GTG ATA GGC CGC CTG TAC GTG AAA 
Ser Asn Arq Tyr Gly Thr Ala Lys Val lie Gly Arg Leu Tyr Val Lys 

300 305 31 J_ 

CAG CCA CTG AAA GCC ACC ATC AGT CCC AGG AAG GTT AAA AGC AGC GTG 
Gin Pro Leu Lys Ala Thr He Ser Pro Arg Lys Val Lys Ser Ser Val 
315 320 325 

GGT AGC CAA GTT TCC TTG TCC TGC AGC GTG ACA GGA ACT GAG GAC CAG 
Gly Ser Gin Val Ser Leu Ser Cys Ser Val Thr Gly Thr Glu Asp Gin 
330 335 34 0 

GAA CTC TCC TGG TAC CGC AAT GGT GAA ATC CTC AAC CCT GGA AAA AAT 
Glu Leu Ser Trp Tyr Arg Asn Gly Glu He Leu Asn Pro Gly Lys Asn 
345 350 355 

GTG AGG ATC ACA GGG ATC AAC CAC GAA AAC CTT ATA ATG GAT CAC ATG 
Val Arg He Thr Gly He Asn His Glu Asn Leu He Met Asp His Met 
360 365 370 375 

GTC AAA AGT GAC GGG GGC GCA TAC CAG TGC TTT GTG CGC AAG GAC AAG 
Val Lys Ser Asp Glv Gly Ala Tyr Gin Cys Phe Val Arg Lys Asp Lys 
380 385 390 

CTG TCC GCT CAA GAC TAT GTG CAG GTG GTC CTT GAA GAT GGA ACT CCC 
Leu Ser Ala Gin Asp Tyr Val Gin Val Val Leu Glu Asp Gly Thr Pro 
395 400 405 

AAA ATT ATT TCT GCC TTT AGT GAA AAG GTG GTG AGT CCA GCA GAG CCG 
Lys He He Ser Ala Phe Ser Glu Lys Val Val Ser Pro Ala Glu Pro 
410 415 420 

GTT TCC CTT ATG TGC AAC GTG AAG GGA ACA CCT TTG CCC ACG ATC ACG 
Val Ser Leu Met Cys Asn Val Lys Gly Thr Pro Leu Pro Thr He Thr 
425 430 435 



1049 



1097 



1145 



1193 



1241 



1289 



1337 



1385 



1433 



1481 



1529 



1577 



1625 



1673 



1721 



1769 
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TGG ACC CTG GAC GAT GAC CCG ATT CTC AAG GGT GGC AGT CAC CGC ATC 1817 

Trp Thr Leu Asp Asp Asp Pro He Leu Lys Gly Gly Ser His Arg lie 

440 445 450 455 

AGC CAG ATG ATC ACG TCG GAG GGG AAC GTG GTC AGC TAC CTG AAC ATC 1865 
Ser Gin Met lie Thr Ser Glu Gly Asn Val Val Ser Tyr Leu Asn lie 
460 465 470 

TCC AGC TCC CAG GTC CGG GAC GGG GGA GTC TAC CGC TGC ACT GCC AAC 1913 
Ser Ser Ser Gin Val Arg Asp Gly Gly Val Tyr Arg Cys Thr Ala Asn 
475 480 485 

AAC TCG GCG GGA GTC GTC CTG TAC CAG GCT CGA ATA AAC GTA AGA GGG 1961 
Asn Ser Ala Gly Val Val Leu Tyr Gin Ala Arg lie Asn Val Arg Gly 
490 495 500 

CCT GCA AGC ATT CGA CCA ATG AAA AAC ATC ACA GCA ATA GCA GGA CGG 2009 
Pro Ala Ser lie Arg Pro Met Lys Asn lie Thr Ala Lie Ala Gly Arg 
505 510 515 

GAC ACA TAC ATT CAC TGT CGT GTG ATT GGC TAT CCG TAT TAC TCC ATT 2057 
Asp Thr Tyr lie His Cys Arg Val lie Gly Tyr Pro Tyr Tyr Ser lie 
520 ' 525 530 53j 

AAA TGG TAC AAG AAC TCT AAC CTG CTT CCT TTC AAC CAC CGC CAA GTG 2105 
Lvs Trp Tyr Lvs Asn Ser Asn Leu Leu Pro Phe Asn His Arg Gin Val 
540 545 550 

GCA TTT GAG AAC AAT GGA ACT CTT AAA CTT TCA GAT GTG CAA AAG GAA 2153 
Ala Phe Glu Asn Asn Gly Thr Leu Lys Leu Ser Asp Val Gin Lys Glu 
555 560 565 

GTG GAC GAG GGG GAG TAC ACG TGC AAC GTG TTG GTT CAA CCA CAA CTC 2201 
Val Asp Glu Gly Glu Tyr Thr Cys Asn Val Leu Val Gin Pro Gin Leu 
570 ' 575 580 

TCC ACC AGC CAG AGC GTC CAC GTG ACC GTG AAA GTT CCG CCT TTC ATA 224 9 

Ser Thr Ser Gin Ser Val His Val Thr Val Lys Val Pro Pro Phe He 
585 590 595 

CAA CCC TTT GAG TTT CCA AGA TTC TCC ATT GGG CAG CGG GTC TTC ATC 2297 
Gin Pro Phe Glu Phe Pro Arg Phe Ser lie Gly Gin Arg Val Phe lie 
600 605 610 615 

CCC TGT GTT GTG GTC TCA GGG GAC TTA CCC ATC ACG ATC ACC TGG CAG 234 5 

Pro Cys Val Val Val Ser Gly Asp Leu Pro lie Thr He Thr Trp Gin 
620 625 630 

AAG GAT GGC CGG CCA ATC CCT GGG AGC CTT GGG GTG ACC ATT GAC AAT 2393 
Lys Asp Gly Arg Pro He Pro Gly Ser Leu Gly Val Thr He Asp Asn 
635 640 64 5 

ATT GAC TTC ACG AGC TCC TTG AGG ATT TCC AAT CTC TCG CTC ATG CAC 24 41 

He Asp Phe Thr Ser Ser Leu Arg He Ser Asn Leu Ser Leu Met His 
650 655 660 

AAT GGG AAT TAC ACC TGC ATA GCC CGG AAT GAG GCC GCC GCT GTG GAG 2489 
Asn Gly Asn Tyr Thr Cys He Ala Arg Asn Glu Ala Ala Ala Val Glu 
665 ' 670 675 

CAC CAA AGC CAG TTG ATT GTC AGA GTT CCT CCC AAG TTT GTG GTT CAG 2537 
His Gin Ser Gin Leu He Val Arg Val Pro Pro Lys Phe Val Val Gin 
680 685 690 695 
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CCA CGG GAC CAG GAC GGG ATT TAT GGC AAA GCA GTC ATC CTC AAT TGT 2585 

Pro Arg Asp Gin Asp Gly lie Tyr Gly Lys Ala Val He Leu Asn Cys 

700 705 710 

TCT GCT GAG GGT TAC CCT GTA CCT ACC ATC GTG TGG AAA TTC TCT AAA 2633 

Ser Ala Glu Gly Tyr Pro Val Pro Thr He Val Trp Lys Phe Ser Lys 

715 720 725 

GGT GCT GGG GTT CCC CAG TTC CAG CCA ATT GCC CTA AAT GGC CGA ATC 2681 

Gly Ala Gly Val Pro Gin Phe Gin Pro He Ala Leu Asn Gly Arg He 

730 735 740 

CAA GTT CTC AGC AAT GGG TCG TTG CTG ATC AAG CAT GTC GTG GAG GAA 2729 

Gin Val Leu Ser Asn Gly Ser Leu Leu He Lys His Val Val Glu Glu 

745 750 755 

GAC AGT GGC TAC TAC CTC TGC AAG GTC AGC AAC GAT GTG GGC GCA GAC 2777 

Asp Ser Gly Tyr Tyr Leu Cys Lys Val Ser Asn Asp Val Gly Ala Asp 

760 765 770 775 

GTC AGC AAG TCC ATG TAC CTC ACG GTT AAA ATT CCT GCG ATG ATA ACA 2825 

Val Ser Lys Ser Met Tyr Leu Thr Val Lys He Pro Ala Met He Thr 

780 785 790 

TCC TAT CCA AAT ACT ACC CTG GCC ACG CAG GGG CAG AAA AAG GAG ATG 2873 

Ser Tyr Pro Asn Thr Thr Leu Ala Thr Gin Gly Gin Lys Lys Glu Met 

795 800 805 

AGC TGC ACG GCG CAT GGT GAG AAG CCC ATT ATA GTC CGC TGG GAG AAG 2921 

Ser Cys Thr Ala His Gly Glu Lys Pro He He Val Arg Trp Glu Lys 

810 _8 JJ5_ 820 

GAG GAC CGA ATC ATT AAC CCT GAG ATG GCC CGT TAT CTT GTG TCC ACC 2969 

Glu Asp Arg He He Asn Pro Glu Met Ala Arg Tyr Leu Val Ser Thr 

825 830 835 

AAG GAG GTG GGA GAA GAG GTG ATT TCT ACT CTG CAG ATT TTG CCA ACT 3017 

Lys Glu Val Gly Glu Glu Val He Ser Thr Leu Gin lie Leu Pro Thr 

840 845 850 855 

GTG AGA GAA GAT TCT GGT TTC TTT TCC TGC CAT GCT ATT AAT TCT TAT 3065 

Val Arg Glu Asp Ser Gly Phe Phe Ser Cys His Ala lie Asn Ser Tyr 

860 865 870 

GGG GAG GAC CGT GGA ATA ATT CAG CTC ACA GTG CAA GAG CCC CCA GAC 3113 

Gly Glu Asp Arg Gly He lie Gin Leu Thr Val Gin Glu Pro Pro Asp 

875 880 885 

CCT CCC GAA ATT GAG ATC AAA GAT GTC AAA GCA CGC ACA ATT ACG CTC 3161 

Pro Pro Glu lie Glu lie Lys Asp Val Lys Ala Arg Thr He Thr Leu 

890 895 900 

AGG TGG ACC ATG GGG TTT GAT GGA AAC AGT CCC ATC ACA GGC TAC GAT 3209 

Arg Trp Thr Met Gly Phe Asp Gly Asn Ser Pro He Thr Gly Tyr Asp 

905 910 915 

ATT GAA TGC AM AAT AAA TCA GAC TCC TGG GAT TCT GCT CAG AGA ACC 3257 

He Glu Cys Lys Asn Lys Ser Asp Ser Trp Asp Ser Ala Gin Arg Thr 

920 ' ' 925 930 935 

AAA GAT GTT TCC CCT CAG CTG AAC TCG GCC ACC ATC ATT GAT ATC CAC 3305 

Lys Asp Val Ser Pro Gin Leu Asn Ser Ala Thr lie lie Asp lie His 

940 945 950 
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CCT TCC TCC ACC TAC AGC ATC CGC ATG TAC GCC AAG AAC CGG ATT GGC 3353 
Pro Ser Ser Thr Tyr Ser He Arg Met Tyr Ala Lys Asn Arg He Gly 
955 960 965 

AAG AGC GAG CCC AGC AAC GAG CTC ACC ATC ACG GCG GAC GAG GCA GCT 34 01 

Lys Ser Glu Pro Ser Asn Glu Leu Thr He Thr Ala Asp Glu Ala Ala 
970 975 980 

CCT GAT GGT CCA CCT CAG GAA GTT CAC CTG GAG CCT ATA TCA TCT CAG 34 4 9 

Pro Asp Gly Pro Pro Gin Glu Val His Leu Glu Pro lie Ser Ser Gin 
985 990 995 

AGC ATC AGG GTC ACA TGG AAG GCT CCC AAG AAA CAT TTG CAA AAT GGG 34 97 

Ser He Arg Val Thr Trp Lys Ala Pro Lys Lys His Leu Gin Asn Gly 
1000 1005 1010 1015 

ATT ATC CGT GGC TAC CAA ATA GGT TAC CGA GAG TAC AGC ACT GGG GGT 354 5 

He He Arg Gly Tyr Gin He Gly Tyr Arq Glu Tyr Ser Thr Gly Gly 
1020 1025 1030 

AAC TTC CAA TTC AAC ATT ATC AGT GTC GAC ACC AGC GGG GAC AGT GAG 3593 
Asn Phe Gin Phe Asn lie He Ser Val Asp Thr Ser Gly Asp Ser Glu 
1035 1040 1045 

GTT TAC ACC CTG GAC AAC CTG AAT AAG TTC ACT CAG TAC GGC CTG GTG 3641 
Val Tyr Thr Leu Asp Asn Leu Asn Lys Phe Thr Gin Tyr Gly Leu Val 
1050 1055 1060 

GTG CAG GCC TGT AAC CGG GCC GGC ACG GGG CCT TCT TCT CAG GAA ATC 3689 
Val Gin Ala Cys Asn Arg Ala Gly Thr Gly Pro Ser Ser Gin Glu He 
1065 1070 1075 

ATC ACC ACC ACT CTC GAG GAT GTG CCC AGT TAC CCC CCC GAA AAT GTC 3737 
He Thr Thr Thr Leu Glu Asp Val Pro Ser Tyr Pro Pro Glu Asn Val 
1080 1085 1090 1095 

CAA GCC ATA GCA ACA TCA CCA GAA AGC ATA TCA ATA TCC TGG TCC ACA 3785 
Gin Ala He Ala Thr Ser Pro Glu Ser lie Ser lie Ser Trp Ser Thr 
1100 1105 1110 

CTT TCC AAG GAA GCC TTG AAT GGA ATT CTC CAG GGG TTC AGA GTC ATT 3833 
Leu Ser Lys Glu Ala Leu Asn Gly lie Leu Gin Gly Phe Arg Val He 
1115 1120 1125 

TAC TGG GCC AAC CTC ATG GAC GGA GAG CTG GGT GAG ATT AAA AAC ATC 3881 
Tyr Trp Ala Asn Leu Met Asp Gly Glu Leu Gly Glu He Lys Asn He 
1130 1135 1140 

ACC ACC ACA CAG CCT TCA CTG GAG CTG GAC GGG CTG GAA AAG TAC ACC 3929 
Thr Thr Thr Gin Pro Ser Leu Glu Leu Asp Gly Leu Glu Lys Tyr Thr 
1145 1150 1155 

AAC TAC AGC ATC CAG GTG CTG GCC TTC ACC CGC GCA GGA GAC GGG GTC 3977 
Asn Tyr Ser He Gin Val Leu Ala Phe Thr Arg Ala Gly Asp Gly Val 
1160 1165 1170 1175 

AGG AGT GAG CAG ATC TTC ACC CGG ACC AAA GAG GAT GTT CCA GGT CCT 4 025 

Arg Ser Glu Gin He Phe Thr Arg Thr Lys Glu Asp Val Pro Gly Pro 
1180 1185 H90 

CCC GCG GGT GTG AAG GCA GCG GCG GCC TCA GCC TCC ATG GTC TTT GTG 4073 
Pro Ala Gly Val Lys Ala Ala Ala Ala Ser Ala Ser Met Val Phe Val 
1195 1200 1205 
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TCC TGG 
Ser Trp 


CTT CCC 
Leu Pro 
1210 


CCT 
Pro 


CTC 
Leu 


AAG 
Lys 


CTG AAC 
Leu Asn 
1215 


GGC 
Gly 


ATC 
He 


ATC 
He 


CGA AAG 
Arg Lys 
1220 


TAC 
Tyr 


ACT 
Thr 


4121 


GTA 
Val 


TTC TGC 
Phe Cys 
1225 


TCC 
Ser 


CAC 
His 


CCC 
Pro 


TAT CCC 
Tyr Pro 
1230 


ACA 
Thr 


GTG 
Val 


ATC 
He 


AGC GAG 
Ser Glu 
1235 


TTT 
Phe 


GAG 
Glu 


GCC 
Ala 


4169 


TCT CCC 
Ser Pro 
1240 


GAC 
Asp 


TCG 
Ser 


TTT 
Phe 


TCC 
Ser 
1241 


TAC 
Tyr 


AGA 
Arg 


ATT 
lie 


CCC 
Pro 


AAC CTG 
Asn Leu 
1250 


AGT 
Ser 


AGG 
Arg 


AAT 
Asn 


CGT 
Arg 
1255 


4217 


CAG 
Gin 


TAC 
Tyr 


AGC 
Ser 


GTC 
Val 


TGG GTG 
Trp Val 
1260 


GTG 
Val 


GCT 
Ala 


GTT 
Val 


ACT TCA 
Thr Ser 
1265 


GCC 
Ala 


GGA 
Gly 


AGA 
Arg 


GGC AAC 
Gly Asn 
1270 


4265 


AGC 
Ser 


AGT 

Ser 


GAA 
Glu 


ATC ATC 
He He 
1275 


ACA 
Thr 


GTC 
Val 


GAG 
Glu 


CCA CTA 
Pro Leu 
1280 


GCA 
Ala 


AAA 
Lys 


GCT 
Ala 


CCT GCA 
Pro Ala 
1285 


CGA 
Arg 


4313 


ATC 
He 


CTG 
Leu 


acc ttc 

Thr Phe 
1290 


AGT 
Ser 


GGG 
Gly 


ACA 
Thr 


GTG ACT 
Val Thr 
1295 


ACT 
Thr 


CCA 
Pro 


TGG 
Trp 


ATG AAA 
Met Lys 
1300 


GAC 
Asp 


ATT 
lie 


4361 


GTC 
Val 


TTG CCT 
Leu Pro 
1305 


TGT 
Cys 


AAG 
Lys 


GCT 
Ala 


GTT GGG GAC 
Val Gly Asp 
1310 


CCT 
Pro 


TCT 
Ser 


CCT GCA 
Pro Ala 
1315 


GTC 
Val 


AAA 
Lys 


TGG 
Trp 


4409 


ATG AAA 
Met Lys 
1320 


GAC 
Asp 


AGT 
Ser 


AAC 
Asn 


GGG ACA 
Gly Thr 
1325 


CCC 
Pro 


AGT 
Ser 


CTA 
Leu 


GTA ACG 
Val Thr 
1330 


ATT 
He 


GAT 
Asp 


GGG 
Gly 


CGG 
Arg 
1335 


4457 


AGG 
Arg 


AGC 
Ser 


ATC 
lie 


TTT 
Phe 


AGC AAC 
Ser Asn 
1340 


GGA 
Gly 


AGC 
Ser 


TTC 
Phe 


ATT ATT 
He He 
1345 


CGC 
Arg 


ACG 
Thr 


GTG 
Val 


AAA GCA 
Lys Ala 
1350 


4505 


GAA 
Glu 


GAC 
Asp 


TCC 
Ser 


GGC TAT 
Gly Tyr 
1355 


TAC 
Tyr 


AGC 
Ser 


TGC 
Cys 


ATT GCC 
He Ala 
1360 


AAT 
Asn 


AAC 
Asn 


AAC 
Asn 


TGG GGA 
Trp Gly 
1365 


TCT 
Ser 


4553 


GAT 
Asp 


GAA 
Glu 


ATT ATT 
lie He 
1370 


TTA 
Leu 


AAC 
Asn 


TTA 
Leu 


CAA GTA 
Gin Val 
1375 


CAA 
Gin 


GTT 
Val 


CCA 
Pro 


CCA GAT 
Pro Asp 
1380 


CAG 
Gin 


CCT 
Pro 


4601 


CGG 
Arg 


CTT ACA 
Leu Thr 
1385 


GTC 
Val 


TCC 
Ser 


AAG 
Lys 


ACC ACG 
Thr Thr 
1390 


TCT 
Ser 


TCC 
Ser 


TCC 
Ser 


ATC ACC 
He Thr 
1395 


CTT 
Leu 


TCT 
Ser 


TGG 
Trp 


4649 


CTC CCT 
Leu Pro 
1400 


GGA 
Gly 


GAC 
Asp 


AAC 
Asn 


GGG GGC 
Gly Gly 
1405 


AGC 
Ser 


TCT 
Ser 


ATC 
He 


AGA GGA 
Arg Gly 
1410 


TAC 
Tyr 


ATA 
He 


CTG 
Leu 


CAG 
Gin 
1415 


4697 


TAC 
Tyr 


TCC 
Ser 


GAG 
Glu 


GAC 
Asp 


AAT AGT 
Asn Ser 
1420 


GAG 
Glu 


CAG 
Gin 


TGG 
Trp 


GGG AGT 
Gly Ser 
1425 


TTT 
Phe 


CCA 
Pro 


ATC 
He 


AGC CCC 
Ser Pro 
1430 


4745 


AGC 
Ser 


GAA 
Glu 


CGT 
Arg 


TCC TAT CGC TTG 
Ser Tyr Arg Leu 
1435 


GAA 
Glu 


AAT CTC 
Asn Leu 
1440 


AAA 
Lys 


TGT 
Cys 


GGG 
Gly 


ACT TGG TAT 
Thr Trp Tyr 
1445 


4793 


AAG 


TTC 


ACA 


CTG 


ACA 


GCC 


CAA 


AAT 


GGA GTG 


GGC 


CCA 


GGG 


CGC 


ATA AGT 


4841 



Lys Phe Thr Leu Thr Ala Gin Asn Gly Val Gly Pro Gly Arg He Ser 
1450 1455 1460 
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GAA ATC ATA GAA GCA AAG ACC TTA GGA AAA GAG CCC CAG TTC TCA AAG 4889 
Glu He He Glu Ala Lys Thr Leu Gly Lys Glu Pro Gin Phe Ser Lys 
1465 1470 ]475 

GAG CAG GAG CTG TTT GCC AGC ATC AAC ACC ACA CGC GTG AGG CTG AAC 4 937 

Glu Gin Glu Leu Phe Ala Ser He Asn Thr Thr Arg Val Arg Leu Asn 
1480 1485 1490 1495 

CTC ATT GGC TGG AAT GAT GGC GGC TGC CCC ATC ACC TCC TTC ACA CTA 4 985 

Leu He Gly Trp Asn Asp Gly Gly Cys Pro lie Thr Ser Phe Thr Leu 
1500 1505 1510 

GAG TAC AGG CCC TTT GGG ACC ACA GTT TGG ACC ACA GCT CAG AGG ACC 5033 
Glu Tyr Arg Pro Phe Gly Thr Thr Val Trp Thr Thr Ala Gin Arg Thr 
1515 1520 1525 

TCT CTC TCC AAG TCC TAC ATC CTG TAT GAC CTG CAG GAA GCC ACC TGG 5081 
Ser Leu Ser Lys Ser Tyr He Leu Tyr Asp Leu Gin Glu Ala Thr Trp 
1530 1535 1540 

TAT GAG CTG CAG ATG CGG GTG TGC AAC AGT GCG GGC TGC GCG GAG AAG 5129 
Tyr Glu Leu Gin Met Arg Val Cys Asn Ser Ala Gly Cys Ala Glu Lys 
1545 1550 1555 

CAG GCC AAC TTC GCT ACG CTG AAC TAC GAT GGC AGT ACA ATT CCT CCA 5177 
Gin Ala Asn Phe Ala Thr Leu Asn Tyr Asp Gly Ser Thr He Pro Pro 
1560 1565 1570 1575 

CTC ATT AAG TCA GTT GTC CAA AAC GAA GAA GGG CTG ACG ACC AAC GAG 5225 
Leu He Lys Ser Val Val Gin Asn Glu Glu Gly Leu Thr Thr Asn Glu 
1580 1585 1590 

GGG CTC AAG ATG CTG GTG ACC ATC TCC TGT ATC CTG GTG GGG GTC TTG 5273 
Gly Leu Lys Met Leu Val Thr He Ser Cys lie Leu Val Gly Val Leu 
1595 1600 1605 

CTG CTG TTT GTG CTC CTG CTG GTT GTG CGG AGG AGG CGG CGG GAG CAG 5321 
Leu Leu Phe Val Leu Leu Leu Val Val Arg Arg Arg Arg Arg Glu Gin 
1610 1615 1620 

AGG CTA AAG AGG CTG CGA GAT GCA AAG AGT TTA GCT GAA ATG CTC ATG 5369 
Arg Leu Lys Arg Leu Arg Asp Ala Lys Ser Leu Ala Glu Met Leu Met 
1625 1630 1635 

AGT AAG AAT ACC CGG ACT TCA GAT ACG TTA AGC AAG CAA CAG CAG ACC 5417 
Ser Lys Asn Thr Arg Thr Ser Asp Thr Leu Ser Lys Gin Gin Gin Thr 
1640 1645 1650 1655 

CTG CGA ATG CAC ATC GAC ATA CCC AGG GCT CAG CTT TTG ATT GAA GAG 54 65 

Leu Arg Met His lie Asp He Pro Arg Ala Gin Leu Leu He Glu Glu 
1660 1665 1670 

AGA GAC ACG ATG GAG ACC ATT GAT GAT CGC TCC ACG GTT CTG TTG ACG 5513 
Arg Asp Thr Met Glu Thr lie Asp Asp Arg Ser Thr Val Leu Leu Thr 
1675 1680 1685 

GAT GCT GAC TTT GGA GAG GCA GCT AAG CAG AAG TCC CTG ACG GTC ACT 5561 
Asp Ala Asp Phe Gly Glu Ala Ala Lys Gin Lys Ser Leu Thr Val Thr 
1690 1695 1700 

CAC ACG GTC CAT TAC CAA TCG GTG TCT CAG GCC ACT GGG CCC TTA GTG 5609 
His Thr Val His Tyr Gin Ser Val Ser Gin Ala Thr Gly Pro Leu Val 
1705 " 1710 1715 
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GAT GTT TCA GAC GCT CGG CCG GGA ACG AAT CCC ACC ACC AGG AGG AAT 5657 

Asp Val Ser Asp Ala Arg Pro Gly Thr Asn Pro Thr Thr Arg Arg Asn 
1720 1725 1 7 30 

GCC AAG GCT GGG CCC ACA GCG AGA AAC CGC TAT GCC AGC CAG TGG ACC 5705 

Ala Lys Ala Gly Pro Thr Ala Arg Asn Arg Tyr Ala Ser Gin Trp Thr 
1740 1745 1750 

CTC AAC CGA CCC CAC CCC ACC ATC TCA GCA CAC ACC CTC ACC ACA GAC 5753 
Leu Asn Arg Pro His Pro Thr He Ser Ala His Thr Leu Thr Thr Asp 
1755 1760 1^65 

TGG AGG CTG CCA ACA CCC AGG GCT GCA GGA TCA GTA GAC AAA GAG AGC 5801 
Trp Arg Leu Pro Thr Pro Arg Ala Ala Gly Ser Val Asp Lys Glu Ser 
1770 1775 1780 

GAC AGT TAC AGC GTC AGC CCC TCG CAA GAC ACA GAT CGA GCA AGA AGC 584 9 

Asp Ser Tyr Ser Val Ser Pro Ser Gin Asp Thr Asp Arg Ala Arg Ser 
.1785 1790 1795 

AGC ATG GTC TCC ACA GAA AGT GCC TCC TCC ACT TAC GAA GAA CTG GCC 5897 
Ser Met Val Ser Thr Glu Ser Ala Ser Ser Thr Tyr Glu Glu Leu Ala 
1800 1805 1810 1815 

AGG GCC TAC GAA CAC GCC AAG ATG GAA GAG CAA CTG AGG CAC GCC AAG 5945 
Arg Ala Tyr Glu His Ala Lys Met Glu Glu Gin Leu Arg His Ala Lys 
1820 1825 1830 

TTC ACC ATC ACG GAG TGC TTC ATA TCA GAC ACG TCA TCG GAG CAG TTG 5993 
Phe Thr lie Thr Glu Cys Phe lie Ser Asp Thr Ser Ser Glu Gin Leu 
_ 1835 1840_ 1845 

ACG GCA GGG ACA AAT GAG TAC ACG GAC AGT CTG ACC TCC AGC ACC CCT 6041 
Thr Ala Gly Thr Asn Glu Tyr Thr Asp Ser Leu Thr Ser Ser Thr Pro 
1850 1855 I860 

TCC GAA TCG GGA ATC TGC AGG TTC ACT GCA TCT CCC CCC AAA CCT CAG 6089 
Ser Glu Ser Gly He Cys Arg Phe Thr Ala Ser Pro Pro Lys Pro Gin 
1865 * 1870 1875 

GAT GGA GGA AGA GTA ATG AAT ATG GCA GTT CCA AAG GCA ATC GGC CAG 6137 
Asp Gly Gly Arg Val Met Asn Met Ala Val Pro Lys Ala lie Gly Gin 
1880 1885 1890 1895 

GTG ACC TCA TAC ATT ' TGC CTC CAT ACC TTA GAA TGG ACT TTT TGT TAAACCGAGG 
Val Thr Ser Tyr He Cys Leu His Thr Leu Glu Trp Thr Phe Cys 
1900 1905 1910 

TGGTCCAGGC ACCAGCAGGG ACCTGAGCTT AGGACAAGCA TGCTTGGAAC CTCAGAAAAG 6252 

CCGGACCCTG AAGCGCCCCA CGGTCCTGGA GCCCATCCCG ATGGAAGCCG CCTCCTCCGC 6312 

CTCCTCCACG AGAGAAGGAC AGTCGTGGCA GCCGGGGGCC GTGGCCACAT TACCTCAGCG 6372 

GGAGGGAGCA GAGCTGGGAC AGGCAGCTAA AATGAGCAGC TCCCAAGAAT CACTGCTCGA 6432 

CTCCCGGGGC CATTTGAAAG GAAACAATCC TTACGCAAAA TCTTACACCC TGGTATAACA 64 92 

GACAGCATGA CTGGACAGCG GTTGTAAATA CAATTCAAAC AATTCAATCA AAGCTACCTT 6552 

TTTTTTACGG AATTCCAATA TTTATAATTA AAGAAAATTG CCAAAATATA TT 6604 
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(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1910 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Trp He Leu Ala Leu Ser Leu Phe Gin Ser Phe Ala Asn Val Phe 
15 10 15 

Ser Glu Asp Leu His Ser Ser Leu Tyr Phe Val Asn Ala Ser Leu Gin 

20 2b 30 

Glu Val Val Phe Ala Ser Thr Thr Gly Thr Leu Va: Pro Cys Pro Ala 
35 40 45 

Ala Gly He Pro Pro Val Thr Leu Arg Trp Tyr Leu Ala Thr Gly Glu 
50 55 60 

Glu He Tvr Asp Val Pro Gly He Arg His Val His Pro Asn Gly Thr 
65 70 75 80 

Leu Gin He Phe Pro Phe Pro Pro Ser Ser Phe Ser Thr Leu He His 
85 90 95 

Asp Asn Thr Tyr Tyr Cys Thr Ala Glu Asn Pro Ser Gly Lys He Arg 

log !_105 ho 

Ser Gin Asp Val His He Lys Ala Val Leu Arg Glu Pro Tyr Thr Val 
115 120 125 

Arq Val Glu Asp Gin Lys Thr Met Arg Gly Asn Val Ala Val Phe Lys 
130 135 140 

Cys lie He Pro Ser Ser Val Glu Ala Tyr He Thr Val Val Ser Trp 
145 150 155 160 

Glu Lys Asp Thr Val Ser Leu Val Ser Gly Ser Arg Phe Leu He Thr 
165 170 175 

Ser Thr Gly Ala Leu Tyr He Lys Asp Val Gin Asn Glu Asp Gly Leu 
180 185 190 

Tyr Asn Tyr Arg Cys lie Thr Arg His Arg Tyr Thr Gly Glu Thr Arg 
195 200 205 

Gin Ser Asn Ser Ala Arg Leu Phe Val Ser Asp Pro Ala Asn Ser Ala 
210 215 220 

Pro Ser He Leu Asp Gly Phe Asp His Arg Lys Ala Met Ala Gly Gin 
225 230 235 240 

Arg Val Glu Leu Pro Cys Lys Ala Leu Gly His Pro Glu Pro Asp Tyr 
245 250 255 

Arg Trp Leu Lys Asp Asn Met Pro Leu Glu Leu Ser Gly Arg Phe Gin 
260 265 270 
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Lys Thr Val Thr Gly Leu Leu lie Glu Asn He Arg Pro Ser Asp Ser 
275 J 200 285 

Gly Ser Tyr Val Cys Glu Val Ser Asn Arg Tyr Gly Thr Ala Lys Val 
290 295 30C 

He Gly Arg Leu Tyr Val Lys Gin Pro Leu Lys Ala Thr He Ser Pro 
305 310 315 320 

Arq Lys Val Lys Ser Ser Val Gly Ser Gin Val Ser Leu Ser Cys Ser 
325 330 335 

Val Thr Gly Thr Glu Asp Gin Glu Leu Ser Trp Tyr Arg Asn Gly Glu 
340 345 350 

He Leu Asn Pro Gly Lys Asn Val Arg lie Thr Gly He Asn His Glu 
355 360 365 

Asn Leu He Met Asp His Met Val Lys Ser Asp Gly Gly Ala Tyr Gin 
370 375 3GC 

Cys Phe Val Arg Lys Asp Lys Leu Ser Ala Gin Asp Tyr Val Gin Val 
385 390 395 400 

Val Leu Glu Asp Gly Thr Pro Lys He He Ser Ala Phe Ser Glu Lys 
405 410 415 

Val Val Ser Pro Ala Glu Pro Val Ser Leu Met Cys Asn Val Lys Gly 
420 425 430 

Thr Pro Leu Pro Thr lie Thr T-p Thr Leu Asp Asp Asp Pro lie Leu 
4^5" "440 4 4 5 

Lys Gly Gly Ser His Arg He Ser Gin Met He Thr Ser Glu Gly Asn 
450 455 460 

Val Val Ser Tyr Leu Asn He Ser Ser Ser Gin Val Arg Asp Gly Gly 
465 470 475 480 

Val Tvr Arq Cys Thr Ala Asn Asn Ser Ala Gly Val Val Leu Tyr Gin 
485 490 • 495 

Ala Arg He Asn Val Arg Gly Pro Ala Ser lie Arg Pro Met Lys Asn 
500 505 510 

lie Thr Ala lie Ala Gly Arg Asp Thr Tyr lie His Cys Arg Val lie 
515 520 525 

Gly Tyr Pro Tyr Tyr Ser lie Lys Trp Tyr Lys Asn Ser Asn Leu Leu 
530 535 54 0 

Pro Phe Asn His Arg Gin Val Ala Phe Glu Asn Asn Gly Thr Leu Lys 
545 550 555 560 

Leu Ser Asp Val Gin Lys Glu Val Asp Glu Gly Glu Tyr Thr Cys Asn 
565 570 575 

Val Leu Val Gin Pro Gin Leu Ser Thr Ser Gin Ser Val His Val Thr 
580 585 590 

Val Lys Val Pro Pro Phe lie Gin Pro Phe Glu Phe Pro Arg Phe Ser 
595 600 605 
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lie Gly Gin Arg Val Phe He Pro Cys Val Val Va3 Ser Gly Asp Leu 
610 " 615 620 

Pro He Thr He Thr Trp Gin Lys Asp Gly Arg Pro He Pro Gly Ser 
625 630 635 640 

Leu Gly Val Thr He Asp Asn He Asp Phe Thr Ser Ser Leu Arg He 
64 5 650 655 

Ser Asn Leu Ser Leu Met His Asn Gly Asn Tyr Thr Cys He Ala Arg 
660 665 670 

Asn Glu Ala Ala Ala Val Glu His Gin Ser Gin Leu He Val Arg Val 
675 680 685 

Pro Pro Lys Phe Val Val Gin Pro Arg Asp Gin Asp Gly lie Tyr Gly 
690 695 700 

Lys Ala Val lie Leu Asn Cys Ser Ala Glu Gly Tyr Pro Val Pro Thr 
705 710 715 720 

lie Val Trp Lys Phe Ser Lys Giy Ala Gly Val Pro Gin Phe Gin Pro 
725 ' 730 735 

Hp Ala Leu Asn Gly Ara He Gin Val Leu Ser Asn Giy Ser Leu Leu 
740 745 750 

lie Lys His Val Val Glu Glu Asp Ser Gly Tyr Tyr Leu Cys Lys Val 
755 760 765 

Sc- Asn Asp Val Gly Ala Asp Val Ser Lys Ser Met Tyr Leu Thr Val 

- 770 " ' "775- - - " 780 

Lvs He Pro Ala Met lie Thr Ser Tyr Pro Asn Thr Thr Leu Ala Thr 
785 790 795 800 

Gin Gly Gin Lys Lys Glu Met Ser Cys Thr Ala His Gly Glu Lys Pro 
805 810 815 

lie lie Va3 Arg Trp Glu Lys Glu Asp Arg lie lie Asn Pro Glu Met 
820 825 830 

Ala Arg Tyr Leu Val Ser Thr Lys Glu Val Gly Glu Glu Val He Ser 
835 840 845 

Thr Leu Gin lie Leu Pro Thr Val Arg Glu Asp Ser Gly Phe Phe Ser 
850 855 860 

Cys His Ala He Asn Ser Tyr Gly Glu Asp Arg Gly He lie Gin Leu 
865 870 875 880 

Thr Val Gin Glu Pro Pro Asp Pro Pro Glu He Glu He Lys Asp Val 
885 890 895 

Lys Ala Arg Thr lie Thr Leu Arg Trp Thr Met Gly Phe Asp Gly Asn 
900 905 910 

Ser Pro He Thr Gly Tyr Asp lie Glu Cys Lys Asn Lys Ser Asp Ser 
915 920 925 

Trp Asp Ser Ala Gin Arg Thr Lys Asp Val Ser Pro Gin Leu Asn Ser 
930 935 940 
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Ala Thr He He Asp He His Pro Ser Ser Thr Tyr Ser He Arg Met 
945 950 955 960 

Tyr Ala Lys Asn Arg He Gly Lys Ser Glu Pro Ser Asn Glu Leu Thr 
Y 965 970 975 

He Thr Ala Asp Glu Ala Ala Pro Asp Gly Pro Pro Gin Glu Val His 
980 985 990 

Leu Glu Pro He Ser Ser Gin Ser He Arg Val Thr Trp Lys Ala Pro 
995 1000 1005 

Lys Lys His Leu Gin Asn Gly He He Arg Gly Tyr Gin He Gly Tyr 
1010 1015 1020 

Arg Glu Tyr Ser Thr Gly Gly Asn Phe Gin Phe Asn He He Ser Val 
1025 J 1030 1035 1040 

Asp Thr Ser Gly Asp Ser Glu Val Tyr Thr Leu Asp Asn Leu Asn Lys 
1045 1050 1055 

Phe Thr Gin Tyr Gly Leu Val Val Gin Ala Cys Asn Arg Ala Gly Thr 
1060 1065 1070 

Gly Pro Ser Ser Gin Glu He He Thr Thr Thr Leu Glu Asp Val Pro 
1075 1080 1085 

Ser Tyr Pro Pro Glu Asn Val Gin Ala He Ala Thr Ser Pro Glu Ser 
1090 1095 1100 

He Ser He Ser Trp Ser Thr Leu Ser Lys Glu Ala Leu Asn Gly He 
1105 1110 - - H15 -1-1-20- 

Leu Gin Gly Phe Arg Val He Tyr Trp Ala Asn Leu Met Asp Gly Glu 
1125 1130 1135 

Leu Gly Glu He Lys Asn He Thr Thr Thr Gin Pro Ser Leu Glu Leu 
1140 H45 1150 

Asp Gly Leu Glu Lys Tyr Thr Asn Tyr Ser He Gin Val Leu Ala Phe 
1155 H60 1165 

Thr Arg Ala Gly Asp Gly Val Arg Ser Glu Gin He Phe Thr Arg Thr 
1170 ' H75 1180 

Lys Glu Asp Val Pro Gly Pro Pro Ala Gly Val Lys Ala Ala Ala Ala 
1185 1 1190 1195 1200 

Ser Ala Ser Met Val Phe Val Ser Trp Leu Pro Pro Leu Lys Leu Asn 
1205 1210 1215 

Gly He lie Arg Lys Tyr Thr Val Phe Cys Ser His Pro Tyr Pro Thr 
1220 1225 1230 

Val He Ser Glu Phe Glu Ala Ser Pro Asp Ser Phe Ser Tyr Arg He 
1235 1240 1245 

Pro Asn Leu Ser Arg Asn Arg Gin Tyr Ser Val Trp Val Val Ala Val 
1250 ' 1255 1260 

Thr Ser Ala Gly Arg Gly Asn Ser Ser Glu He He Thr Val Glu Pro 
1265 1270 1275 1280 
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Leu Ala Lys Ala Pro Ala Arg lie Leu Thr Phe Ser Gly Thr Val Thr 
1285 1290 1295 

Thr Pro Trp Met Lys Asp lie Val Leu Pro Cys Lys Ala Val Gly Asp 
1300 1305 1310 

Pro Ser Pro Ala Val Lys Trp Met Lys Asp Ser Asn Gly Thr Pro Ser 
1315 1320 1325 

Leu Val Thr lie Asp Gly Arg Arg Ser lie Phe Ser Asn Gly Ser Phe 
1330 1335 1340 

He lie Arg Thr Val Lys Ala Glu Asp Ser Gly Tyr Tyr Ser Cys lie 
1345 1350 1355 1360 

Ala Asn Asn Asn Trp Gly Ser Asp Glu lie He Leu Asn Leu Gin Val 
1365 1370 1375 

Gin Val Pro Pro Asp Gin Pro Arg Leu Thr Val Ser Lys Thr Thr Ser 
1380 1385 1390 

Ser Ser lie Thr Leu Ser Trp Leu Pro Gly Asp Asn Gly Gly Ser Ser 
1395 1400 1405 

He Arg Gly Tyr He Leu Gin Tyr Ser Glu Asp Asn Ser Glu Gin Trp 
1410 1415 1420 

Glv Ser Phe Pro He Ser Pro Ser Glu Arg Ser Tyr Arg Leu Glu Asn 
1425 " 1430 1435 1440 

Leu Lys Cys Gly Thr Trp Tyr Lys Phe Thr Leu Thr Ala Gin Asn Gly 

m5 14 50 14 55- — 

Val G3y Pro Gly Arg He Ser Glu He He Glu Ala Lys Thr Leu Gly 
1460 1465 1470 

Lys Glu Pro Gin Phe Ser Lys Glu Gin Glu Leu Phe Ala Ser He Asn 
1475 1480 1485 

Thr Thr Arg Val Arg Leu Asn Leu He Gly Trp Asn Asp Gly Gly Cys 
1490 1495 1500 

Pro lie Thr Ser Phe Thr Leu Glu Tyr Arg Pro Phe Gly Thr Thr Val 
1505 1510 1515 1520 

Trp Thr Thr Ala Gin Arg Thr Ser Leu Ser Lys Ser Tyr He Leu Tyr 
1525 1530 1535 

Asp Leu Gin Glu Ala Thr Trp Tyr Glu Leu Gin Met Arg Val Cys Asn 
1540 1545 1550 

Ser Ala Gly Cys Ala Glu Lys Gin Ala Asn Phe Ala Thr Leu Asn Tyr 
1555 1560 1565 

Asp Gly Ser Thr He Pro Pro Leu He Lys Ser Val Val Gin Asn Glu 
1570 1575 1580 

Glu Gly Leu Thr Thr Asn Glu Gly Leu Lys Met Leu Val Thr He Ser 
1585 1590 1595 1600 

Cys lie Leu Val Gly Val Leu Leu Leu Phe Val Leu Leu Leu Val Val 
1605 1610 1615 
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Arq Arq Arq Arq Arg Glu Gin Arg Leu Lys Arg Leu Arg Asp Ala Lys 
1620 1625 1630 

Ser Leu Ala Glu Met Leu Met Ser Lys Asn Thr Arg Thr Ser Asp Thr 
1635 1640 1645 

Leu Ser Lys Gin Gin Gin Thr Leu Arg. Met His He Asp lie Pro Arg 
1650 1655 1660 

Ala Gin Leu Leu lie Glu Glu Arg Asp Thr Met Glu Thr Tie Asp Asp 
1665 1670 1675 1680 

Arq Ser Thr Val Leu Leu Thr Asp Ala Asp Phe Gly Glu Ala Ala Lys 
1685 1690 1695 

Gin Lys Ser Leu Thr Val Thr His Thr Val His Tyr Gin Ser Val Ser 
1700 1705 1710 

Gin Ala Thr Gly Pro Leu Val Asp Val Ser Asp Ala Arg Pro Gly Thr 
1715 1720 1725 

Asn Pro Thr Thr Arg Arg Asn Ala Lys Ala Gly Pro Thr Ala Arg Asn 
1730 1735 1740 

Arq Tvr Ala Ser Gin Trp Thr Leu Asn Arg Pro His Pro Thr lie Ser 
1745 1750 1755 1760 

Ala His Thr Leu Thr Thr Asp Trp Arg Leu Pro Thr Pro Arg Ala Ala 
1765 1770 1775 

Gly Ser Val Asp lys Glu Ser Asp Ser Tyr Ser Val Ser Pro Ser Gin 
- 1780 ------ 1785 - - - 1790 

Asp Thr Asp Arg Ala Arg Ser Ser Met Val Ser Thr Glu Ser Ala Ser 
1795 1800 1805 

Ser Thr Tyr Glu Glu Leu Ala Arg Ala Tyr Glu His Ala Lys Met Glu 
1810 1815 1820 

Glu Gin Leu Arg His Ala Lys Phe Thr He Thr Glu Cys Phe He Ser 
1825 1830 1835 1840 

Asp Thr Ser Ser Glu Gin Leu Thr Ala Gly Thr Asn Glu Tyr Thr Asp 
1845 185C 1855 

Ser Leu Thr Ser Ser Thr Pro Ser Glu Ser Gly He Cys Arg Phe Thr 
I860 1865 1870 

Ala Ser Pro Pro Lys Pro Gin Asp Gly Gly Arg Val Met Asn Met Ala 
1875 1880 1885 

Val Pro Lys Ala He Gly Gin Val Thr Ser Tyr He Cys Leu His Thr 
1890 1895 1900 

Leu Glu Trp Thr Phe Cys 
1905 1910 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 388 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 



CCGGGTATTC 


TTACTCATGA 


GCATTTCAGC 


TAAACTCTTT 


GCATCTCGCA 


GCCTCTTTAG 


60 


CCTCTGCTCC 


CGCCGCCTCC 


TCCGCACAAC 


CAGCAGGAGC 


ACAAACAGCA 


GCAAGACCCC 


120 


CACCAGGATA 


CAGGAGATGG 


TCACCAGCAT 


CTTGAGCCCC 


TCGTTGGTCG 


TCAGCCCTTC 


180 


TTCGTTTTGG 


ACAACTGACT 


TAATGAGTGG 


AGGAATTGTA 


CTGCCATCGT 


AGTTCAGCGT 


240 


AGCGAAGTTG 


GCCTGCTTCT 


CCGCGCAGCC 


CGCACTGTTG 


CACACCCGCA 


TCTGCAGCTC 


300 


ATACCAGGTG 


GCTTCCTGCA 


GGTCATACAG 


GATGTAGGAC 


TTGGAGAGAG 


AGGTCCTCTG 


360 


AGCTGTGGTC 


CAAACTGTGG 


TCCCAAAG 








388 



(2) INFORMATION FOR SEQ ID NO: 4: 

"""" "~ ' ("ij "SEQUENCE" CHARACTERISTICS : — ~ 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CCTGATGCTC GAGTGAATTC 



(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCAGTTCTCA AAGGAGCAGG 



20 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCTGTATGAC CTGCAGGAAG 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



CCGGGCCGGG 


CGCGGCGGAG 


CGCAGCGCAA 


CGCGGGGGGC GAGGCCGGCG 


CGTGGCTCGC -• 


- 60 


TCGCTGGCTC 


GCTGGCTCGC 


GGGAGGCCGG 


GCAGCAGCAG 


GGGCATGTGG 


ATACTGGCTC 


120 


TCTCCTTGTT 


CCAGAGCTTC 


GCGAATGTTT 


TCAGTGAAGA 


GCCCCACTCC 


AGCCTCTACT 


180 


TTGTCAATGC 


ATCGCTGCAA 


GAGGTAGTGT 


TTGCAAGCAC 


ATCGGGGACG 


CTGGTGCCCT 


240 


GCCCGGCTGC 


AGGCATCCCT 


CCTGTGACTC 


TCAGATGGTA 


CCTAGCAACG 


GGCGAGGAGA 


300 


TCTACGATGT 


CCCCGGGATC 


CGCCACGTCC 


ATCCCAATGG 


CACTCTCCAA 


ATTTTCCCCT 


360 


TTCCTCCTTC 


AAGCTTCAGC 


ACCTTAATCC 


ATGATAATAC 


TTACTATTGC 


ACAGCTGAAA 


420 


ACCCTTCAGG 


GAAAATTAGA 


AGTCAGGATG 


TCCACATCAA 


GGCTGTTTTA 


CGGGAGCCCT 


480 


ATACAGTCCG 


TGTGGAGGAC 


CAGAAAACCA 


TGAGAGGCAA 


TGTCGCGGTG 


TTCAAGTGCA 


540 


TTATCCCCTC 


CTCGGTGGAG 


GCGTACGTCT 


CTGTCGTCTC 


ATGGGAGAAA 


GACACGGTTT 


600 


CACTTGTCTC 


AGGATCTAGA 


TTTCTCATCA 


CATCCACGGG 


AGCCTTGTAT 


ATTAAAGATG 


660 


TTCAGAACGA 


AGATGGGCTG 


TACAACTACC 


GCTGCATCGC 


GCGGCACAGA 


TTCGCGGGGG 


720 


AGACGAGACA 


GAGCAACTGC 


GCGAGACTGT 


TCGTGTCAGA ACCAGCAAAC 


TCAGCCCATC 


780 


CATCCTGGAA 


GGGTTTGACC 


ACCGCCAAAC 


CATGGCCGGG CACGCGTGGA GCTGCCTTGC 


840 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 898 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 










TTGCAAGCCT 


GTACTACAGG 


CCATACTGCG 


TGAATTATCA 


GGTTGTCCAG 


60 


GGTGTACACT 


TCGCTGTLLC 




ATACTGATGA 


TGTTGAACTG 


GAAGTTACCC 


120 


CGTGCTGTAC 


TCCGGTAGCC 


TATTGGTAGC 


CGCGAATGAT 


CCCGTCTTGT 


ATAGTGTTCT 


i on 


TGGGAGCCTC 


TCCAGGTAAC 


CCTGATACTC 


TGAGATGAGG 


TGGGTTCCAA 


GTGAACTTCC 


240 


TGAGGTGGAC 


ATCACGAGCT 


GCCTCATCCG 


CCGTGATGG? 


GATCTCGTTG 


CTGGGCTCAC 


300 


TCTTGCCAAT 


CCGGTTCTTG 


GCGTACATGC 


GGATGCTGTA 


GGTGGAGGAA 


GGGTGGATAT 


360 


CAATGATGGT 


GGCCGAGTTC 


AGCTGAGGGG 


AAACATCTTT 


GGTTCTCTGA GCAGAATCCC 


420 


ACGAGTCTGA 


TTTATTTTTG 


CATTCACACT 


GTCATAGCCT 


GTGATGGGGC 


TGTTGCCATC 


480 


AAACCCCATG 


GTCCACCTGA 


GCGTGATGGT 


^CGAGCTTTG 


ACATCTCTTG 


ATCTCAATCT 


540 


CGGGAGGATC 


TGGGGGTTCT 


TGCACTGTGA 


GTTGAATTAT 


TCCACGGTCC 


TCCCCGTATG 


600 


AATTGATAGC 


ATGGCAGGAG 


AAGAAACCGG 


AATCTTCTCT 


CACTGTTGGC 


AAAATCTGCA 


660 


GCGTAGATAT 


CACTTCCTCT 


CCCACCTCCT 


TGGTGGATAC 


AGTACGGGCC 


ACTTTCAGGG 


720 


TTAATGATCC 


TGTCTCTCTT 


CTCCAGCGGA 


CAATGATGGG 


CTCTCCCATG 


GGCTGTGCAG 


780 


CTCATTCCTT 


CCTTTGACCC 


TGATGGCCAG 


GTGGTGTGGG 


TATAAGTTAT 


ATCATGGCCG 


840 


GAATTTCCCT 


GTGAGTCCAT 


GGACTTGCTG 


AACGTTCTGC 


GCCCACATCG 


TTCGCTGA 


898 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2173 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
ACCACCATTC ACACACCCAG ACATGGCGGG TTCGCGGCAA CCTTCAGTTC CTGGCCTTCC 
TGTAGGGTAA AGGGCTGCTG CGGGTTTATA GACCGGCACA TGCCCATCCT GGCATACGGT 
GGCCAGTGGC TTTCCATCTG GATTCCAGGC CAAGCTAAAA ATCTGTTCCT GATGGCCCTG 
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CAGTTTCAGC 


CGTTCAGCTC 


CAGTCTGAAG 


1 1 bbbAbAib 


pp a appptt A 

bbAAbboi LH 


PATPATAP.P.A 


240 


ACTGGAAGCC 


AGTACATCGG 


CAGCCAGGGG 


b 1 bo/\noLuL 


apapaptapa 


TPTTTTPTGT 


300 


GTGGCCTGTG 


AGCACAGTCT 


CAGGTGTGGT 


^A/^AAPATTP 

bAbAAbA 1 1 b 


tpp apppapp 


fiAGCGTTCAT 


360 


ACCGCTTGGA 


AAACCTAAAG 


TGTGGGACTT 


bb 1 Al AAb I 1 


P APPPTT APT 


ppppaaaatg 


420 


GAGTAGGTCC 


CGGGCGCATA 


AGTGAAATCA 


tapa appp a a 

J AbAAov^LAA 


aappptpppc; 


AAAGAACCCC 


480 


AGTTCTCCAA 


GGAGCAGGAG 


CTTTTCGCCA 


/"*/" l AT , /"'AA r PAP 

bbAi bAA 1 AL 


papppp aptp 


AP.P,CTGAATC 


540 


TGATTGGCTG 


GAATGACGGC 


GGCTGTCCAA 


TCACCTCAT I 


bAb 1 b 1 1 oAM 


TAPAPAPPPT ' 


600 


TTGGGACCAC 


GGTCTGGACC 


ACAGCTCAGC 


GGACCTbCCi 


J 1 bbAAb 1 bb 


t a ar ATTPTP 


660 


TATGACCTGC 


AAGAAGCCAC 


GTGGTATGAA 


CTGCAGATGA 


bAb 1 b 1 bbAA 


p a ppp pp pp. c 

bAbbbbboob 


170 


TGTGCGGATA 


AGCAAGCCAA 


CTTCGCCACG 


CTGAACTACG 


ATGGbAGI Ab 


AA 1 bbb 1 bbA 


i an 


CTCATTAAGT 


CAGTTGTCCA 


CAAAGCGAAG 


AAGGGCTGAC 


AACL AAbbAA 


bbbb 1 bAAbA 


0 'i J 


TCCTCGTGAC 


CATCTCCTGC 


ATCCTGGTCG 


GGGTTCTACT 


GCTwTTTbib 


bi 1 blbblbb 




TTGTGCGGAG 


GAGACGGCGA 


GAGCAGAGGC 


TGAAGAGGCT 


GAbAbAi bbA 


AAb AO 111 AO 




CTGAAATGCT 


CATGAGCAAA 


AACACACGGA 


CT IXAbAI Ab 


b i i AAbbAAA 


PAPPAPPAPA 


1020 


CTTTGAGAAT 


GCACATTGAT 


ATACCCAGGG 


Ci bAbb I 1 I I 


r'A'r'TPAAPAP 
b A i i bAAljAo 


AP APAPAPA A 


1080 


TGGAGACCAT 


AGATGACCGC 


TCCACAGTCC 


iGT i bAbbbA 


J. bb i bAL. lib 


ppr:p.AP.r:PAP. 


1140 


CCAAACAGAA 


GTCACTGACA 


GTGACTCACA 


CGbTbbAl J A 


bbAA 1 bolj 1 Kj 


TPTPAPPPPA 


1200 


CCGGGCCCCT 


CGTGGATGTC 


TCCGATGCTC 


GGCCAGbAAb 


bAAl bbbAbb 


APPAPPAPPA 


i. £. DVJ 


ATGCAAAGGC 


TGGACCCACA 


GCGAGAAACC 


GGTACGCCAG 


CCAbTbbAbb 


PTP A A PAP AP 
b 1 bAAb flWiL 


J. O-t u 


CCCATCCTAC 


CATCTCTGCA 


CACACCCTCA 


CCACAGAATG 


AGACTGCTAb 


AbbAbbb 1 Ab 




AGGATCCGTG 


ACAGGAGAGC 


GACAGTACAG 


CGTCAGCCCA 


TTCACAAGAb 


AbAbAbbAbb 


1 4 *i U 


AAGAAGCAGC 


ATGTTCTCCA 


CAGAAAGTGC 


TTCTTCTACC 


TACGAAGAbi 


bbbAbbbb 1 A 


1 jUU 


TGAACACGCC 


AAGATGGAAG 


AGCAGCTGAG 


GCATGCCAAG 


TTCACCATCA 


CAGAbi bbTT 


1 0 DU 


CATATCCGAT 


ACGTCCTCCG 


AGCAGTTGAC 


GGCAGGACAA 


ATGAGTACAC 


GGAbAb 1 b 1 b 




ACTCCAGTAC 


CCCTTCAGAA 


TCGGGATCTG 


CAGATTCATG 


CATbTbbbbb 


pa apptpapp 

bAAbb 1 L/ibb 


X DO U 


ATGGAGGACG 


AGTGTGAACA 


TGGCGGTTCC 


AAAGGCCCAT 


CGGCCAGGCG 


ACTCATACAC 


1740 


CTGCTCCATA 


CCTACGATGG 


ATTCTTGTTA 


AACCGGGCGC 


ACCAGGCACC 


AGCAGGACTG 


1800 


AGTTTAGGAC 


AAGCGTGCTT 


GGAACCCCAG 


AAAGTCGGAC 


CCTGAAACGC 


CCCACGGTCG 


1860 


TTGAGCCCAC 


CCCTATGGAG 


GCCTCCTCCT 


CCACTTCTTC 


CACGCGAGAA 


GGACAGCAGT 


1920 


CGTGGCAACA 


AGGGGCTGTG 


GCCACCTTAC 


CTCAGCGAGA 


GGGTGCAGAG 


CTGGACAGGC 


1980 


AGCTAAAATG 


AGCAGCTCCC 


AAGAGTCACT 


GCTGGACTCC 


CGGGCCATTG 


AAAGGAACAA 


2040 


TCCCTACGCA 


AATCTTACAC 


CTTGGTATAA 


CACATGGCAC 


TGATGGACAG 


CGGTTGTAAT 


2100 
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ACAATTAACG AGCCAATCAA GCTACTTTTT TATGAATTCC GATATTTATA ATTAAGAATT 
GCCAAATATA TTA 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: both 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 4 53.. 5168 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



TGACTGAGGC CGGAGCACGG CAAAGATGAG 


CCTGCCCGCC 


CGCCTGCTGC CTGGATGCGG 


60 


AGGGTGAGGG CTGGCGCACG GGAGGCCGCT 


GGCTGCGCAT 


TCTGGGCGCC GAGTGCCCGG 


120 


GATGAGCTCA CGCCCGCGTC TGCGGCTCTC 


TCCACCTGCC 


GACCTGCCGG GGGCCCACTG 


180 


AGCTGACGGC GCACCTGGGC TCCGGCCGCA 


GCGTGGGGCG 


CGGCGCCCGG GAGCAGGTGT 


240 


GCAGGAGCGCAGCGCGCGGC GAGCGCAGCC 


CTCGCTCCGG 


AGCCCGGCCG CGCCGCGTGC 


300 


CCGGGCGGCT AGGCAGCGGC GGCGGCGGCG 


GCGGGCGGCG 


GGCGGGCGGC GGCCCCCGGG 


360 


CAGGTGCCGA GCGGCGAGCG GAGCCGGGCC 


GGGCGGAGCG 


CGGGGGGCGA GGCCGGCGCG 


420 


TCGCTCGCGG GAGGCCGGGG AGCGGCAGGG 


GC 


ATG 
Met 
1 


TGG 
Trp 


ATA 
He 


CTG 
Leu 


GCT 
Ala 
5 


CTC 
Leu 


TCC 
Ser 


473 


TTG 
Leu 


TTC 
Phe 


CAG 
Gin 
10 


AGC 
Ser 


TTC 
Phe 


GCG 
Ala 


AAT 
Asn 


GTT 
Val 
15 


TTC 
Phe 


AGT 
Ser 


GAA 
Glu 


GAC 
Asp 


CTA 
Leu 
20 


CAC 
His 


TCC 
Ser 


AGC 
Ser 


521 


CTC 
Leu 


TAC 
Tyr 
25 


TTT 
Phe 


GTC 
Val 


AAT 
Asn 


GCA 
Ala 


TCT 
Ser 
30 


CTG 
Leu 


CAA 
Gin 


GAG 
Glu 


GTA 
Val 


GTG 
Val 
35 


TTT 
Phe 


GCC 
Ala 


AGC 
Ser 


ACC 
Thr 


569 


ACG 
Thr 
40 


GGG 
Gly 


ACT 
Thr 


CTG 
Leu 


GTG 
Val 


CCC 
Pro 
45 


TGC 
Cys 


CCC 
Pro 


GCA 
Ala 


GCA 
Ala 


GGC ATC 
Gly He 
50 


CCT 
Pro 


CCT 
Pro 


GTG 
Val 


ACT 
Thr 
55 


617 


CTC 
Leu 


AGA 
Arg 


TGG 
Trp 


TAC 
Tyr 


CTA 
Leu 
60 


GCC 
Ala 


ACG 
Thr 


GGC 
Gly 


GAG 
Glu 


GAG 
Glu 
65 


ATC TAC 
He Tyr 


GAT 
Asp 


GTC 
Val 


CCC 
Pro 
70 


GGG 
Gly 


665 


ATC 
He 


CGC 
Arg 


CAC 
His 


GTC 
Val 
75 


CAC 
His 


CCC 
Pro 


AAC 
Asn 


GGC 
Gly 


ACT 
Thr 
80 


CTC 
Leu 


CAA 
Gin 


ATT 
lie 


TTC 
Phe 


CCC 
Pro 
85 


TTC 
Phe 


CCT 
Pro 


713 


CCT 
Pro 


TCA 
Ser 


AGC 
Ser 


TTC 
Phe 


AGT 
Ser 


ACC 
Thr 


TTA 
Leu 


ATC 
He 


CAT 
His 


GAT 
Asp 


AAT 
Asn 


ACT 
Thr 


TAT 

Tyr 
i on 


TAT 
Tyr 


TGC 
Cys 


ACA 
Thr 


761 
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GCT GAA AAT CCT TCA GGG AAA ATT AGA AGT CAG GAT GTC CAC ATC AAG 
Ala Glu Asn Pro Ser Gly Lys lie Arg Ser Gin Asp Val His He Lys 
105 HO 115 

GCT GTT TTA CGG GAG CCC TAT ACA GTC CGT GTG GAG GAC CAG AAA ACC 
Ala Val Leu Arg Glu Pro Tyr Thr Val Arg Val Glu Asp Gin Lys Thr 
120 125 130 135 

ATG AGA GGC AAT GTT GCG GTC TTC AAG TGC ATT ATC CCC TCC TCG GTG 
Met Arg Gly Asn Val Ala Val Phe Lys Cys He He Pro Ser Ser Val 
140 145 lbU 

GAG GCG TAC ATC ACT GTC GTC TCA TGG GAG AAA GAC ACT GTT TCA CTT 
Glu Ala Tyr lie Thr Val Val Ser Trp Glu Lys Asp Thr Val Ser Leu 
155 160 165 

GTC TCA GGA TCT AGA TTT CTC ATC ACA TCC ACG GGA GCC TTG TAT ATT 
Val Ser Gly Ser Arg Phe Leu He Thr Ser Thr Gly Ala Leu Tyr He 
170 175 180 

AAA GAT GTA CAG AAT GAA GAT GGA TTG TAT AAC TAC CGC TGC ATC ACG 
Lys Asp Val Gin Asn Glu Asp Gly Leu Tyr Asn Tyr Arg Cys He Thr 
185 190 195 

CGG CAT CGA TAC ACC GGA GAG ACG AGG CAG AGC AAC AGC GCC AGA CTT 
Arq His Arq Tyr Thr Gly Glu Thr Arg Gin Ser Asn Ser Ala Arg Leu 
200 205 210 215 

TTT GTA TCA GAC CCA GCG AAC TCA GCC CCA TCC ATA CTG GAT GGG TTT 
Phe Val Ser Asp Pro Ala Asn Ser Ala Pro Ser He Leu Asp Gly Phe 

220_ 225 _ 230 

GAC CAT CGC AAA GCC ATG GCT GGG CAG CGT GTG GAG CTG CCT TGC AAA 
Asp His Arg Lys Ala Met Ala Gly Gin Arg Val Glu Leu Pro Cys Lys 
K 235 240 245 

GCG CTC GGG CAC CCT GAG CCA GAT TAC CGC TGG CTG AAG GAC AAC ATG 
Ala Leu Gly His Pro Glu Pro Asp Tyr Arg Trp Leu Lys Asp Asn Met 
250 255 260 

CCC CTG GAA CTT TCA GGG AGG TTC CAG AAG ACC GTG ACG GGG CTG CTC 
Pro Leu Glu Leu Ser Gly Arg Phe Gin Lys Thr Val Thr Gly Leu Leu 
265 270 275 

ATT GAG AAC ATT CGC CCC TCG GAC TCA GGC AGC TAT GTT TGT GAA GTG 
He Glu Asn He Arg Pro Ser Asp Ser Gly Ser Tyr Val Cys Glu Val 
280 285 290 295 

TCC AAC AGA TAC GGA ACT GCT AAG GTG ATA GGC CGC CTG TAC GTG AAA 
Ser Asn Arg Tyr Gly Thr Ala Lys Val lie Gly Arg Leu Tyr Val Lys 
300 305 310 

CAG CCA CTG AAA GCC ACC ATC AGT CCC AGG AAG GTT AAA AGC AGC GTG 
Gin Pro Leu Lys Ala Thr lie Ser Pro Arg Lys Val Lys Ser Ser Val 
315 320 325 

GGT AGC CAA GTT TCC TTG TCC TGC AGC GTG ACA GGA ACT GAG GAC CAG 
Gly Ser Gin Val Ser Leu Ser Cys Ser Val Thr Gly Thr Glu Asp Gin 
330 335 340 

GAA CTC TCC TGG TAC CGC AAT GGT GAA ATC CTC AAC CCT GGA AAA AAT 
Glu Leu Ser Trp Tyr Arg Asn Gly Glu He Leu Asn Pro Gly Lys Asn 
345 * 350 355 



809 



857 



905 



953 



1001 



1049 



1097 



1145 



1193 



1241 



1289 



1337 



1385 



1433 



1481 



1529 
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GTG AGG ATC ACA GGG ATC AAC CAC GAA AAC CTT ATA ATG GAT CAC ATG 1577 
Val Arg lie Thr Gly lie Asn His Glu Asn Leu lie Met Asp His Met 
360 365 370 375 

GTC AAA AGT GAC GGG GGC GCA TAC CAG TGC TTT GTG CGC AAG GAC AAG 1625 
Val Lys Ser Asp Gly Gly Ala Tyr Gin Cys Phe Val Arg Lys Asp Lys 
380 385 390 

CTG TCC GCT CAA GAC TAT GTG CAG GTG GTC CTT GAA GAT GGA ACT CCC 1673 
Leu Ser Ala Gin Asp Tyr Val Gin Val Val Leu Glu Asp Gly Thr Pro 
395 400 405 

AAA ATT ATT TCT GCC TTT AGT GAA AAG GTG GTG AGT CCA GCA GAG CCG 17 21 

Lvs He lie Ser Ala Phe Ser Glu Lys Val Val Ser Pro Ala Glu Pro 
410 415 420 

GTT TCC CTT ATG TGC AAC GTG AAG GGA ACA CCT TTG CCC ACG ATC ACG 17 69 

Val Ser Leu Met Cys Asn Val Lys Gly Thr Pro Leu Pro Thr He Thr 
425 430 435 

TGG ACC CTG GAC GAT GAC CCG ATT CTC AAG GGT GGC AGT CAC CGC ATC 1817 
Trp Thr Leu Asp Asp Asp Pro lie Leu Lys Gly Gly Ser His Arg He 
440 * 445 450 455 

AGC CAG ATG ATC ACG TCG GAG GGG AAC GTG GTC AGC TAC CTG AAC ATC 1865 
Ser Gin Met lie Thr Ser Glu Gly Asn Val Val Ser Tyr Leu Asn lie 
460 465 470 

TCC AGC TCC CAG GTC CGG GAC GGG GGA GTC TAC CGC TGC ACT GCC AAC 1913 
Ser Ser Ser Gin Val Arg Asp Gly Gly Val Tyr Arg Cys Thr Ala Asn 
475 480 485 

AAC TCG GCG GGA GTC GTC CTG TAC CAG GCT CGA ATA AAC GTA AGA GGG 1961 
Asn Ser Ala Gly Val Val Leu Tyr Gin Ala Arg He Asn Val Arg Gly 
490 495 500 

CCT GCA AGC ATT CGA CCA ATG AAA AAC ATC ACA GCA ATA GCA GGA CGG 
Pro Ala Ser He Arg Pro Met Lys Asn He Thr Ala He Ala Gly Arg 
505 510 515 

GAC ACA TAC ATT CAC TGT CGT GTG ATT GGC TAT CCG TAT TAC TCC ATT 2057 
Asp Thr Tyr He His Cys Arg Val He Gly Tyr Pro Tyr Tyr Ser He 
520 525 530 535 

AAA TGG TAC AAG AAC TCT AAC CTG CTT CCT TTC AAC CAC CGC CAA GTG 2105 
Lvs Trp Tyr Lys Asn Ser Asn Leu Leu Pro Phe Asn His Arg Gin Val 
540 545 550 

GCA TTT GAG AAC AAT GGA ACT CTT AAA CTT TCA GAT GTG CAA AAG GAA 2153 
Ala Phe Glu Asn Asn Gly Thr Leu Lys Leu Ser Asp Val Gin Lys Glu 
555 560 565 

GTG GAC GAG GGG GAG TAC ACG TGC AAC GTG TTG GTT CAA CCA CAA CTC 2201 
Val Asp Glu Gly Glu Tyr Thr Cys Asn Val Leu Val Gin Pro Gin Leu 
570 . 575 580 

TCC ACC AGC CAG AGC GTC CAC GTG ACC GTG AAA GTT CCG CCT TTC ATA 2249 
Ser Thr Ser Gin Ser Val His Val Thr Val Lys Val Pro Pro Phe He 
585 590 595 

CAA CCC TTT GAG TTT CCA AGA TTC TCC ATT GGG CAG CGG GTC TTC ATC 2297 
Gin Pro Phe Glu Phe Pro Arg Phe Ser He Gly Gin Arg Val Phe He 
600 605 610 615 
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CCC TGT GTT GTG GTC TCA GGG GAC TTA CCC ATC ACG ATC ACC TGG CAG 234 5 

Pro Cys Val Val Val Ser Gly Asp Leu Pro He Thr He Thr Trp Gin 
620 625 630 

AAG GAT GGC CGG CCA ATC CCT GGG AGC CTT GGG GTG ACC ATT GAC AAT 2393 
Lys Asp Gly Arg Pro He Pro Gly Ser Leu Gly Va3 Thr lie Asp Asn 
635 640 645 

ATT GAC TTC ACG AGC TCC TTG AGG ATT TCC AAT CTC TCG CTC ATG CAC 24 41 

He Asp Phe Thr Ser Ser Leu Arg Tie Ser Asn Leu Ser Leu Met His 
650 655 660 

AAT GGG AAT TAC ACC TGC ATA GCC CGG AAT GAG GCC GCC GCT GTG GAG 2489 
Asn Gly Asn Tyr Thr Cys He Ala Arg Asn Glu Ala Ala Ala Val Glu 
665 " 670 675 

CAC CAA AGC CAG TTG ATT GTC AGA GTT CCT CCC AAG TTT GTG GTT CAG 2537 
His Gin Ser Gin Leu He Val Arg Val Pro Pro Lys Phe Val Val Gin 
680 685 690 695 

CCA CGG GAC CAG GAC GGG ATT TAT GGC AAA GCA GTC ATC CTC AAT TGT 2585 
Pro Arg Asp Gin Asp Gly lie Tyr Gly Lys Ala Val He Leu Asn Cys 
700 705 710 

TCT GCT GAG GGT TAC CCT GTA CCT ACC ATC GTG TGG AAA TTC TCT AAA 2 633 

Ser Ala Glu Gly Tyr Pro Val Pro Thr He Val Trp Lys Phe Ser Lys 
715 720 725 

GGT GCT GGG GTT CCC CAG TTC CAG CCA ATT GCC CTA AAT GGC CGA ATC 2681 
Gly Ala Gly Val Pro Gin Phe Gin Pro He Ala Leu Asn Gly Arg He 
730 735 740 

CAA GTT CTC AGC AAT GGG TCG TTG CTG ATC AAG CAT GTC GTG GAG GAA 2729 
Gin Val Leu Ser Asn Gly Ser Leu Leu He Lys His Val Val Glu Glu 
745 750 755 

GAC AGT GGC TAC TAC CTC TGC AAG GTC AGC AAC GAT GTG GGC GCA GAC 2777 
Asp Ser Gly Tyr Tyr Leu Cys Lys Val Ser Asn Asp Val Gly Ala Asp 
760 765 770 775 

GTC AGC AAG TCC ATG TAC CTC ACG GTT AAA ATT CCT GCG ATG ATA ACA 2825 
Val Ser Lys Ser Met Tyr Leu Thr Val Lys He Pro Ala Met He Thr 
780 785 790 

TCC TAT CCA AAT ACT ACC CTG GCC ACG CAG GGG CAG AAA AAG GAG ATG 287 3 

Ser Tyr Pro Asn Thr Thr Leu Ala Thr Gin Gly Gin Lys Lys Glu Met 
795 800 805 

AGC TGC ACG GCG CAT GGT GAG AAG CCC ATT ATA GTC CGC TGG GAG AAG 2921 
Ser Cys Thr Ala His Gly Glu Lys Pro He He Val Arg Trp Glu Lys 
810 815 820 

GA ;AC CGA ATC ATT AAC CCT GAG ATG GCC CGT TAT CTT GTG TCC ACC 2969 
Glu Asp Arg He He Asn Pro Glu Met Ala Arg Tyr Leu Val Ser Thr 
825 830 835 

AAG GAG GTG GGA GAA GAG GTG ATT TCT ACT CTG CAG ATT TTG CCA ACT 3017 
Lys Glu Val Gly Glu Glu Val He Ser Thr Leu Gin He Leu Pro Thr 
840 845 850 855 

GTG AGA GAA GAT TCT GGT TTC TTT TCC TGC CAT GCT ATT AAT TCT TAT 3065 
Val Arg Glu Asp Ser Gly Phe Phe Ser Cys His Ala lie Asn Ser Tyr 
860 865 870 
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GGG GAG GAC CGT GGA ATA ATT CAG CTC ACA GTG CAA GAG CCC CCA GAC 3113 

Gly Glu Asp Arg Gly lie lie Gin Leu Thr Val Gin Glu Pro Pro Asp 
875 880 885 

CCT CCC GAA ATT GAG ATC AAA GAT GTC AAA GCA CGC ACA ATT ACG CTC 3161 
Pro Pro Glu lie Glu lie Lys Asp Val Lys Ala Arg Thr lie Thr Leu 
890 895 900 

AGG TGG ACC ATG GGG TTT GAT GGA AAC AGT CCC ATC ACA GGC TAC GAT 3209 
Ara Trp Thr Met Gly Phe Asp Gly Asn Ser Pro lie Thr Gly Tyr Asp 
" 905 " 910 91^ 

ATT GAA TGC AAA AAT AAA TCA GAC TCC TGG GAT TCT GCT CAG AGA ACC 3257 
He Glu Cys Lys Asn Lys Ser Asp Ser Trp Asp Ser Ala Gin Arg Thr 
920 925 930 935 

AAA GAT GTT TCC CCT CAG CTG AAC TCG GCC ACC ATC ATT GAT ATC CAC 3305 
Lvs Asd Val Ser Pro Gin Leu Asn Ser Ala Thr He He Asp He His 
y 940 945 950 

CCT TCC TCC ACC TAC AGC ATC CGC ATG TAC GCC AAG AAC CGG ATT GGC 335 3 

Pro Ser Ser Thr Tyr Ser He Arg Met Tyr Ala Lys Asn Arg He Gly 
955 960 965 

AAG AGC GAG CCC AGC AAC GAG CTC ACC ATC ACG GCG GAC GAG GCA GCT 34 01 

Lvs Ser Glu Pro Ser Asn Glu Leu Thr He Thr Ala Asp Glu Ala Ala 
970 975 980 

CCT GAT GGT CCA CCT CAG GAA GTT CAC CTG GAG CCT ATA TCA TCT CAG 34 4 9 

Pro Asp Gly Pro Pro Gin Glu Val His Leu Glu Pro He Ser Ser Gin 
985 _ . . . .990 . 995 

AGC ATC AGG GTC ACA TGG AAG GCT CCC AAG AAA CAT TTG CAA AAT GGG 34 97 

Ser He Arg Val Thr Trp Lys Ala Pro Lys Lys His Leu Gin Asn Gly 
1000 1005 1010 1015 

ATT ATC CGT GGC TAC CAA ATA GGT TAC CGA GAG TAC AGC ACT GGG GGT 354 5 

He He Arg Gly Tyr Gin He Gly Tyr Arg Glu Tyr Ser Thr Gly Gly 
1020 1025 1030 

AAC TTC CAA TTC AAC ATT ATC AGT GTC GAC ACC AGC GGG GAC AGT GAG 3593 
Asn Phe Gin Phe Asn He He Ser Val Asp Thr Ser Gly Asp Ser Glu 
1035 1040 1045 

GTT TAC ACC CTG GAC AAC CTG AAT AAG TTC ACT CAG TAC GGC CTG GTG 3641 
Val Tyr Thr Leu Asp Asn Leu Asn Lys Phe Thr Gin Tyr Gly Leu Val 
1050 1055 1060 

GTG CAG GCC TGT AAC CGG GCC GGC ACG GGG CCT TCT TCT CAG GAA ATC 3689 
Val Gin Ala Cys Asn Arg Ala Gly Thr Gly Pro Ser Ser Gin Glu He 
1065 ' 1070 1075 

ATC ACC ACC ACT CTC GAG GAT GTG CCC AGT TAC CCC CCC GAA AAT GTC 3737 
He Thr Thr Thr Leu Glu Asp Val Pro Ser Tyr Pro Pro Glu Asn Val 
1080 1085 1090 1095 

CAA GCC ATA GCA ACA TCA CCA GAA AGC ATA TCA ATA TCC TGG TCC ACA 3785 
Gin Ala He Ala Thr Ser Pro Glu Ser He Ser He Ser Trp Ser Thr 
1100 H05 HIO 

CTT TCC AAG GAA GCC TTG AAT GGA ATT CTC CAG GGG TTC AGA GTC ATT 3833 
Leu Ser Lys Glu Ala Leu Asn Gly He Leu Gin Gly Phe Arg Val He 
1115 U20 H25 
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TAC TGG GCC AAC CTC ATG GAC GGA GAG CTG GGT GAG ATT AAA AAC ATC 3881 

Tyr Trp Ala Asn Leu Met Asp Gly Glu Leu Gly Glu lie Lys Asn lie 
U 30 1135 H40 

ACC ACC ACA CAG CCT TCA CTG GAG CTG GAC GGG CTG GAA AAG TAC ACC 3 929 

Thr Thr Thr Gin Pro Ser Leu Glu Leu Asp Gly Leu Glu Lys Tyr Thr 
U45 1150 1155 

AAC TAC AGC ATC CAG GTG CTG GCC TTC ACC CGC GCA GGA GAC GGG GTC 3977 
Asn Tyr Ser lie Gin Val Leu Ala Phe Thr Arg Aia Gly Asp Gly Val 
1160 H65 H70 

AGG AGT GAG CAG ATC TTC ACC CGG ACC AAA GAG GAT GTT CCA GGT CCT 4 025 

Arg Ser Glu Gin lie Phe Thr Arg Thr Lys Glu Asp Val Pro Gly Pro 
1180 H85 H90 

CCC GCG GGT GTG AAG GCA GCG GCG GCC TCA GCC TCC ATG GTC TTT GTG 407 3 

Pro Ala Gly Val Lys Ala Ala Ala Ala Ser Ala Ser Met Val Phe Val 
H95 1200 1205 

TCC TGG CTT CCC CCT CTC AAG CTG AAC GGC ATC ATC CGA AAG TAC ACT 4121 
Ser Tro Leu Pro Pro Leu Lys Leu Asn Gly lie lie Arg Lys Tyr Thr 
H 1210 1215 1220 

GT/> TTC TGC TCC CAC CCC TAT CCC ACA GTG ATC AGC GAG TTT GAG GCC 4169 
Val Phe Cys Ser His Pro Tyr Pro Thr Val lie Ser Glu Phe Glu Ala 
1225 1230 1235 

TCT CCC GAC TCG TTT TCC TAC AGA ATT CCC AAC CTG AGT AGG AAT CGT 4 217 
Ser Pro Asp Ser Phe Ser Tyr Arg lie Fro Asn Leu Ser Arg Asn Arg 
_12_40 1245 __12j>0 1255 _ ■ 

CAG TAC AGC GTC TGG GTG GTG GCT GTT ACT TCA GCC GGA AGA GGC AAC 4265 
Gin Tyr Ser Val Trp Val Val Ala Val Thr Ser Ala Gly Arg Gly Asn 
1260 1265 1270 

AGC AGT GAA ATC ATC ACA GTC GAG CCA CTA GCA AAA GCT CCT GCA CGA 4313 
Ser Ser Glu He lie Thr Val Glu Pro Leu Ala Lys Ala Pro Ala Arg 
1275 1280 1285 

ATC CTG ACC TTC AGT GGG ACA GTG ACT ACT CCA TGG ATG AAA GAC ATT 4 361 

lie Leu Thr Phe Ser Gly Thr Val Thr Thr Pro Trp Met Lys Asp lie 
1290 1295 1300 

GTC TTG CCT TGT AAG GCT GTT GGG GAC CCT TCT CCT GCA GTC AAA TGG 4 409 

Val Leu Pro Cys Lys Ala Val Gly Asp Pro Ser Pro Ala Val Lys Trp 
1305 1310 1315 

ATG AAA GAC AGT AAC GGG ACA CCC AGT CTA GTA ACG ATT GAT GGG CGG 4 4 57 

Met Lys Asp Ser Asn Gly Thr Pro Ser Leu Val Thr lie Asp Gly Arg 
1320 1325 1330 1335 

AGG AGC ATC TTT AGC AAC GGA AGC TTC ATT ATT CGC ACG GTG AAA GCA 4 505 

Arq Ser He Phe Ser Asn Gly Ser Phe He He Arg Thr Val Lys Ala 
1340 1345 1350 

GAA GAC TCC GGC TAT TAC AGC TGC ATT GCC AAT AAC AAC TGG GGA TCT 4553 
Glu Asp Ser Gly Tyr Tyr Ser Cys He Ala Asn Asn Asn Trp Gly Ser 
1355 1360 1365 

GAT GAA ATT ATT TTA AAC TTA CAA GTA CAA GTT CCA CCA GAT CAG CCT 4 601 

Asp Glu He He Leu Asn Leu Gin Val Gin Val Pro Pro Asp Gin Pro 
1370 1375 1380 
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CGG CTT ACA GTC TCC AAG ACC ACG TCT TCC TCC ATC ACC CTT TCT TGG 4 64 9 

Arq Leu Thr Val Ser Lys Thr Thr Ser Ser Ser lie Thr Leu Ser Trp 
1385 1390 1395 

CTC CCT GGA GAC AAC GGG GGC AGC TCT ATC AGA GGA TAC ATA CTG CAG 4697 
Leu Pro Gly Asp Asn Gly Gly Ser Ser lie Arg Gly Tyr He Leu Gin 
1400 1405 1410 1415 

TAC TCC GAG GAC AAT AGT GAG CAG TGG GGG AGT TTT CCA ATC AGC CCC 474 5 

Tvr Ser Glu Asp Asn Ser Glu Gin Trp Gly Ser Phe Pro lie Ser Pro 
y 1420 1425 1430 

AGC GAA CGT TCC TAT CGC TTG GAA AAT CTC AAA TGT GGG ACT TGG TAT 4 793 

Ser Glu Arg Ser Tyr Arg Leu Glu Asn Leu Lys Cys Gly Thr Trp Tyr 
1435 1440 1445 

AAG TTC ACA CTG ACA GCC CAA AAT GGA GTG GGC CCA GGG CGC ATA AGT 4841 
Lys Phe Thr Leu Thr Ala Gin Asn Gly Val Gly Pro Gly Arg He Ser 
1450 1455 1460 

GAA ATC ATA GAA GCA AAG ACC TTA GGA AAA GAG CCC CAG TTC TCA AAG 4 889 

Glu He He Glu Ala Lys Thr Leu Gly Lys Glu Pro Gin Phe Ser Lys 
1465 147C 1475 

GAG CAG GAG CTG TTT GCC AGC ATC AAC ACC ACA CGC GTG AGG CTG AAC 4 937 

Glu Gin Glu Leu Phe Ala Ser lie Asn Thr Thr Arg Val Arg Leu Asn 
' 1480 1485 1490 1495 

CTC ATT GGC TGG AAT GAT GGC GGC TGC CCC ATC ACC TCC TTC ACA CTA 4 985 
Leu He Gly Trp Asn Asp Gly Gly Cys Pro lie Thr Ser Phe Thr Leu 
1500 1505 1510 

GAG TAC AGG CCC TTT GGG ACC ACA GTT TGG ACC ACA GCT CAG AGG ACC 5033 
Glu Tyr Arg Pro Phe Gly Thr Thr Val Trp Thr Thr Ala Gin Arg Thr 
1515 ~ 1520 1525 

TCT CTC TCC AAG TCC TAC ATC CTG TAT GAC CTG CAG GAA GCC ACC TGG 5081 
Ser Leu Ser Lys Ser Tyr He Leu Tyr Asp Leu Gin Glu Ala Thr Trp 
1530 1535 1540 

TAT GAG CTG CAG ATG CGG GTG TGC AAC AGT GCG GGC TGC GCG GAG AAG 512 9 

Tyr Glu Leu Gin Met Arg Val Cys Asn Ser Ala Gly Cys Ala Glu Lys 
1545 1550 1555 

CAG GCT AAA GAG GCT GCG AGA TGC AAA GAG TTT AGC TGAAATGCTC 517 5 
Gin Ala Lys Glu Ala Ala Arg Cys Lys Glu Phe Ser 
1560 1565 1570 

ATGAGTAAGA ATACCCGGAC TTCAGATACG TTAAGCAAGC AACAGCAGAC CCTGCGAATG 5235 

CACATCGACA TACCCAGGGC TCAGCTTTTG ATTGAAGAGA GAGACACGAT GGAGACCATT 5295 

GATGATCGCT CCACGGTTCT GTTGACGGAT GCTGACTTTG GAGAGGCAGC TAAGCAGAAG 5355 

TCCCTGACGG TCACTCACAC GGTCCATTAC CAATCGGTGT CTCAGGCCAC TGGGCCCTTA 5415 

GTGGATGTTT CAGACGCTCG GCCGGGAACG AATCCCACCA CCAGGAGGAA TGCCAAGGCT 5475 

GGGCCCACAG CGAGAAACCG CTATGCCAGC CAGTGGACCC TCAACCGACC CCACCCCACC 5535 

ATCTCAGCAC ACACCCTCAC CACAGACTGG AGGCTGCCAA CACCCAGGGC TGCAGGATCA 5595 

GTAGACAAAG AGAGCGACAG TTACAGCGTC AGCCCCTCGC AAGACACAGA TCGAGCAAGA 5655 
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AGCAGCATGG 


TCTCCACAGA AAGTGCCTCC 




AAGAACTGGC 


CAGGGCCTAC 


5715 


GAACACGCCA 


AGATGGAAGA 


GCAACTGAGG 




TCACCATCAC 


GGAGTGCTTC 


5775 


ATATCAGACA 


CGTCATCGGA 


GCAGTTGACG 




ATGAGTACAC 


GGACAGTCTG 


5835 


ACCTCCAGCA 


CCCCTTCCGA 


ATCGGGAATC 




CTGCATCTCC 


CCCCAAACCT 


5895 


CAGGATGGAG 


GAAGAGTAAT 


GAATATGGCA 




CAATCGGCCA 


GGTGACCTCA 


5955 


TACATTTGLC 


TCCATACCTT 


AGAATGGACT 


TTTTGTTAAA 


CCGAGGTGGT 


CCAGGCACCA 


6015 


GCAGGGACCT 


GAGCTTAGGA CAAGCATGCT 


TGGAACCTCA 


GAAAAGCCGG 


ACCCTGAAGC 


bU / D 


GCCCCACGGT 


CCTGGAGCCC 


ATCCCGATGG 


AAGCCGCCTC 


CTCCGCCTCC 


TCCACGAGAG 


6135 


AAGGACAGTC 


GTGGCAGCCG 


GGGGCCGTGG 


CCACATTACC 


TCAGCGGGAG 


GGAGCAGAGC 


6195 


TGGGACAGGC 


AGCTAAAATG 


AGCAGCTCCC 


AAGAATCACT 


GCTCGACTCC 


CGGGGCCATT 


6255 


TGAAAGGAAA 


CAATCCTTAC 


GCAAAATCTT 


ACACCCTGGT 


ATAACAGACA 


GCATGACTGG 


6315 


ACAGCGGTTG 


TAAATACAAT 


TCAAACAATT 


CAATCAAAGC 


TACCTTTTTT 


TTACGGAATT 


6375 


CCAATATTTA 


TAATTAAAGA 


AAATTGCCAA 


AATATATT 






6413 



(2} INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1571 amino acids^ 
" ' (B) T Y PET: am in o acid 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Trp He Leu Ala Leu Ser Leu Phe Gin Ser Phe Ala Asn Val Phe 
15 10 15 

Ser Glu Asp Leu His Ser Ser Leu Tyr Phe Val Asn Ala Ser Leu Gin 
20 25 30 

Glu Val Val Phe Ala Ser Thr Thr Gly Thr Leu Val Pro Cys Pro Ala 
35 40 45 

Ala Gly He Pro Pro Val Thr Leu Arg Trp Tyr Leu Ala Thr Gly Glu 
50 55 60 

Glu He Tyr Asp Val Pro Gly He Arg His Val His Pro Asn Gly Thr 
65 ' 70 75 80 

Leu Gin He Phe Pro Phe Pro Pro Ser Ser Phe Ser Thr Leu He His 
85 90 95 

Asp Asn Thr Tyr Tyr Cys Thr Ala Glu Asn Pro Ser Gly Lys He Arg 
100 105 HO 

Ser Gin Asp Val His He Lys Ala Val Leu Arg Glu Pro Tyr Thr Val 
115 120 125 
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Arq Val Glu Asp Gin Lys Thr Met Arg Gly Asn Val Ala Val Phe Lys 

135 14 0 



130 



Cys 
145 



He He Pro Ser Ser Val Glu Ala Tyr He Thr Val Val Ser Trp 



150 



155 



160 



Glu Lys Asp Thr Val Ser Leu Val Ser Gly Ser Arg Phe Leu lie Thr 



165 



170 



Ser Thr Gly Ala Leu Tyr He Lys Asp Val Gin Asn Glu Asp Gly Leu 
180 * 185 190 

Tvr Asn Tyr Arg Cys He Thr Arg His Arg Tyr Thr Gly Glu Thr Arg 
195 200 205 

Gin Ser Asn Ser Ala Arg Leu Phe Val Ser Asp Pro Ala Asn Ser Ala 
210 215 220 

Pro Ser He Leu Asp Gly Phe Asp His Arg Lys Ala Met Ala Gly Gin 
225 230 235 240 

Ara Val Glu Leu Pro Cys Lys Ala Leu Gly His Pro Glu Pro Asp Tyr 
* 245 * 250 255 

Arq Trp Leu Lys Asp Asn Met Pro Leu Glu Leu Ser Gly Arg Phe Gin 
260 265 270 

Lvs Thr Val Thr Gly Leu Leu He Glu Asn He Arg Pro Ser Asp Ser 
275 280 285 

Glv Ser Tyr Va3 Cys Glu Val Ser Asn Arg Tyr Gly Thr Ala Lys Val 

" 290 295 300 

He Gly Arg Leu Tyr Val Lys Gin Pro Leu Lys Ala Thr He Ser Pro 
305 310 315 320 

Arg Lys Val Lys Ser Ser Val Gly Ser Gin Val Ser Leu Ser Cys Ser 
325 330 335 

Val Thr Gly Thr Glu Asp Gin Glu Leu Ser Trp Tyr Arg Asn Gly Glu 
340 345 350 

He Leu Asn Pro Gly Lys Asn Val Arg He Thr Gly He Asn His Glu 
355 360 365 

Asn Leu He Met Asp His Met Val Lys Ser Asp Gly Gly Ala Tyr Gin 
370 375 380 

Cvs Phe Val Arg Lys Asp Lys Leu Ser Ala Gin Asp Tyr Val Gin Val 
385 390 395 400 

Val Leu Glu Asp Gly Thr Pro Lys -He He Ser Ala Phe Ser Glu Lys 
405 410 415 

Val Val Ser Pro Ala Glu Pro Val Ser Leu Met Cys Asn Val Lys Gly 
420 425 430 

Thr Pro Leu Pro Thr He Thr Trp Thr Leu Asp Asp Asp Pro He Leu 
435 440 445 

Lvs Gly Gly Ser His Arg He Ser Gin Met He Thr Ser Glu Gly Asn 
450 455 460 
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Val Val Ser Tyr Leu Asn He Ser Ser Ser Gin Val Arg Asp Gly Gly 
465 470 475 480 

Val Tyr Arg Cys Thr Ala Asn Asn Ser Ala Gly Val Val Leu Tyr Gin 
485 490 495 

Ala Arg He Asn Val Arg Gly Pro Ala Ser He Arg Pro Met Lys Asn 
500 505 510 

He Thr Ala He Ala Gly Arg Asp Thr Tyr He His Cys Arg Val He 
515 520 525 

Gly Tyr Pro Tyr Tyr Ser He Lys Trp Tyr Lys Asn Ser Asn Leu Leu 
530 535 540 

Pro Phe Asn His Arg Gin Val Ala Phe Glu Asn Asn Gly Thr Leu Lys 
545 550 555 560 

Leu Ser Asp Val Gin Lys Glu Val Asp Glu Gly Glu Tyr Thr Cys Asn 
565 570 575 

Val Leu Val Gin Pro Gin Leu Ser Thr Ser Gin Ser Val His Val Thr 
580 585 590 

Val Lvs Val Pro Pro Phe He Gin Pro Phe Glu Phe Pro Arg Phe Ser 
595 600 605 

He Gly Gin Arg Val Phe He Pro Cys Val Val Val Ser Gly Asp Leu 
610 615 620 

Pro He Thr He Thr Trp Gin Lys Asp Gly Arg Pro He Pro Gly Ser 

625 - 630 635 - 64-0- 

Leu Gly Val Thr He Asp Asn He Asp Phe Thr Ser Ser Leu Arg He 
645 650 655 

Ser Asn Leu Ser Leu Met His Asn Gly Asn Tyr Thr Cys He Ala Arg 
660 665 670 

Asn Glu Ala Ala Ala Val Glu His Gin Ser Gin Leu He Val Arg Val 
675 680 685 

Pro Pro Lys Phe Val Val Gin Pro Arg Asp Gin Asp Gly He Tyr Gly 
690 695 700 

Lys Ala Val He Leu Asn Cys Ser Ala Glu Gly Tyr Pro Val Pro Thr 
705 710 715 720 

He Val Trp Lys Phe Ser Lys Gly Ala Gly Val Pro Gin Phe Gin Pro 
725 730 735 

He Ala Leu Asn Gly Arg lie Gin Val Leu Ser Asn Gly Ser Leu Leu 
740 745 750 

He Lys His Val Val Glu Glu Asp Ser Gly Tyr Tyr Leu Cys Lys Val 
755 760 765 

Ser Asn Asp Val Gly Ala Asp Val Ser Lys Ser Met Tyr Leu Thr Val 
770 775 780 

Lys He Pro Ala Met He Thr Ser Tyr Pro Asn Thr Thr Leu Ala Thr 
785 790 795 800 
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Gin Gly Gin Lys Lys Glu Met Ser Cys Thr Ala His Gly Glu Lys Pro 
805 810 815 

He He Val Arg Trp Glu Lys Glu Asp Arg lie He Asn Pro Glu Met 
820 825 830 

Ala Arq Tyr Leu Val Ser Thr Lys Glu Val Gly Glu Glu Val lie Ser 
835 840 845 

Thr Leu Gin lie Leu Pro Thr Val Arg Glu Asp Ser Gly Phe Phe Ser 
850 855 860 

Cys His Ala He Asn Ser Tyr Gly Glu Asp Arg Gly He lie Gin Leu 
865 870 875 880 

Thr Val Gin Glu Pro Pro Asp Pro Pro Glu He Glu lie Lys Asp Val 
885 890 895 

Lys Ala Arg Thr He Thr Leu Arg Trp Thr Met Gly Phe Asp Gly Asn 
900 905 910 

Ser Pro He Thr Gly Tyr Asp lie Glu Cys Lys Asn Lys Ser Asp Ser 
915 " 920 925 

Trp Asp Ser Ala Gin Arg Thr Lys Asp Val Ser Pro Gin Leu Asn Ser 
930 935 9^0 

Ala Thr He He Asp He His Pro Ser Ser Thr Tyr Ser He Arg Met 
945 950 955 960 

Tyr Ala Lys Asn Arg He Gly Lys Ser Glu Pro Ser Asn Glu Leu Thr 

' 965 - - 970 • • 975 

He Thr Ala Asp Glu Ala Ala Pro Asp Gly Pro Pro Gin Glu Val His 
980 985 ' 990 

Leu Glu Pro He Ser Ser Gin Ser He Arg Val Thr Trp Lys Ala Pro 
995 1000 1005 

Lys Lys His Leu Gin Asn Gly lie lie Arg Gly Tyr Gin lie Gly Tyr 
1010 1015 1020 

Arg Glu Tyr Ser Thr Gly Gly Asn Phe Gin Phe Asn He He Ser Val 
1025 1030 1035 1040 

Asp Thr Ser Gly Asp Ser Glu Val Tyr Thr Leu Asp Asn Leu Asn Lys 
1045 1050 1055 

Phe Thr Gin Tyr Gly Leu Val Val Gin Ala Cys Asn Arg Ala Gly Thr 
1060 1065 1070 

Gly Pro Ser Ser Gin Glu He He Thr Thr Thr Leu Glu Asp Val Pro 
1075 1080 1085 

Ser Tyr Pro Pro Glu Asn Val Gin Ala He Ala Thr Ser Pro Glu Ser 
1090 1095 1100 

He Ser He Ser Trp Ser Thr Leu Ser Lys Glu Ala Leu Asn Gly He 
1105 IHO 1115 1120 

Leu Gin GLy Phe Arg Val He Tyr Trp Ala Asn Leu Met Asp Gly Glu 
1125 H30 H35 
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Leu Gly Glu lie Lys Asn lie Thr Thr Thr Gin Pro Ser Leu Glu Leu 
1140 H45 H50 

Asp Gly Leu Glu Lys Tyr Thr Asn Tyr Ser lie Gin Val Leu Ala Phe 
1155 H60 1165 

Thr Arq Ala Gly Asp Gly Val Arg Ser Glu Gin lie Phe Thr Arg Thr 
1170 1175 H80 

Lys Glu Asp Val Pro Gly Pro Pro Ala Gly Val Lys Ala Ala Ala Ala 
1185 1190 1195 1200 

Ser Ala Ser Met Val Phe Val Ser Trp Leu Pro Pro Leu Lys Leu Asn 
1205 1210 1215 

Glv He He Arg Lys Tyr Thr Val Phe Cys Ser His Pro Tyr Pro Thr 
1220 1225 1230 

Val He Ser Glu Phe Glu Ala Ser Pro Asp Ser Phe Ser Tyr Arg lie 
1235 . 1240 1245 

Pro Asn Leu Ser Arg Asn Arg Gin Tyr Ser Val Trp Val Val Ala Val 
1250 1255 1260 

Thr Ser Ala Gly Arg Gly Asn Ser Ser Glu lie lie Thr Val Glu Pro 
1265 1270 1275 1280 

Leu Ala Lys Ala Pro Ala Arg He Leu Thr Phe Ser Gly Thr Val Thr 
1285 " 1290 1295 

Thr Pro Trp Met Lys Asp He Val Leu Pro Cys Lys Ala Val Gly Asp 
"1300 1305 "1310 

Pro Ser Pro Ala Val Lys Trp Met Lys Asp Ser Asn Gly Thr Pro Ser 
1315 " 1320 1325 

Leu Val Thr He Asp Gly Arg Arg Ser lie Phe Ser Asn Gly Ser Phe 
1330 1335 1340 

lie He Arg Thr Val Lys Ala Glu Asp Ser Gly Tyr Tyr Ser Cys lie 
1345 1350 1355 1360 

Ala Asn Asn Asn Trp Gly Ser Asp Glu He He Leu Asn Leu Gin Val 
1365 1370 1375 

Gin Val Pro Pro Asp Gin Pro Arg Leu Thr Val Ser Lys Thr Thr Ser 
1380 1385 1390 

Ser Ser He Thr Leu Ser Trp Leu Pro Gly Asp Asn Gly Gly Ser Ser 
1395 1400 1405 

He Arg Gly Tyr He Leu Gin Tyr Ser Glu Asp Asn Ser Glu Gin Trp 
1410 1415 1420 

Gly Ser Phe Pro He Ser Pro Ser Glu Arg Ser Tyr Arg Leu Glu Asn 
1425 1430 1435 1440 

Leu Lys Cys Gly Thr Trp Tyr Lys Phe Thr Leu Thr Ala Gin Asn Gly 
1445 1450 1455 

Val Gly Pro Gly Arg He Ser Glu lie lie Glu Ala Lys Thr Leu Gly 
1460 1465 1470 
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Lys Glu Pro Gin Phe Ser Lys Glu Gin Glu Leu Phe Ala Ser lie Asn 
1475 1480 1485 

Thr Thr Arg Val Arg Leu Asn Leu lie Gly Trp Asn Asp Gly Gly Cys 
1490 1495 1500 

Pro lie Thr Ser Phe Thr Leu Glu Tyr Arg Pro Phe Gly Thr Thr Val 
1505 1510 1515 1520 

Trp Thr Thr Ala Gin Arg Thr Ser Leu Ser Lys Ser Tyr lie Leu Tyr 
1525 1530 1535 

Asp Leu Gin Glu Ala Thr Trp Tyr Glu Leu Gin Met Arg Val Cys Asn 
1540 1545 1550 

Ser Ala Gly Cys Ala Glu Lys Gin Ala Lys Glu Ala Ala Arg Cys Lys 
1555 1560 1565 

Glu Phe Ser 
1570 
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That which is claimed is: 

l m Isolated nucleic acid encoding a mammalian DS- 

CAM member of the Immunoglobin (Ig) superfamily of 
proteins, or a fragment thereof, wherein said DS-CAM 
5 comprises at least 7 Ig-like domains. 

2. Isolated nucleic acid according to claim 1, 
wherein said nucleic acid, or fragments thereof, is 
selected from: 

10 (a) DNA encoding the amino acid sequence set forth 

in SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM coding 
region of SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9, 

(b) DNA that hybridizes to the DNA of (a) under 
moderately stringent conditions, wherein said DNA 

15 encodes biologically active DS-CAM, or 

(c) DNA degenerate with respect to either (a) or 
(b) above, wherein said DNA encodes biologically active 

DS-CAM 

3. A nucleic acid according to claim 2, wherein 
20 said nucleic acid hybridizes under high stringency 

conditions to the DS-CAM coding portion of nucleotides 
SEQ ID N0:1, SEQ ID NO: 7, SEQ ID NO : 8 , SEQ ID NO : 9 or 
SEQ ID NO: 10. 

4. A nucleic acid according to claim 2, wherein 
25 the nucleotide sequence of said nucleic acid is 

substantially the same as that set forth in SEQ ID NO:l, 
SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9 or SEQ ID NO:10. 

5. A nucleic acid according to claim 2, wherein 
the nucleotide sequence of said nucleic acid is the same 

30 as that set forth in SEQ ID NO:l, SEQ ID NO: 7, 
SEQ ID NO: 8, SEQ ID NO: 9 or SEQ ID NO: 10. 
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6. A nucleic acid according to claim 2, wherein 
said nucleic acid is cDNA. 

7. A vector containing the nucleic acid of claim 

2. 

5 8. Recombinant cells containing the nucleic acid 

of claim 2. 

9. An oligonucleotide comprising at least 15 
nucleotides capable of specifically hybridizing with a 
sequence of nucleic acids of the nucleotide sequence set 

10 forth in SEQ ID NO:l, SEQ ID NO : 7 , SEQ ID NO : 8 , 
SEQ ID NO: 9 or SEQ ID NO: 10. 

10. An oligonucleotide according to claim 9, 
wherein said oligonucleotide is labeled with a detectable 
marker. 

15 ii. An antisense oligonucleotide capable of 

specifically binding to mRNA encoded by said nucleic acid 
according to claim 2. 

12. A kit for detecting the presence of the DS-CAM 
cDNA sequence comprising at least one oligonucleotide 

20 according to claim 10. 

13. * An isolated DS-CAM protein comprising at least 
7 Ig-like domains. 

14. A DS-CAM protein according to claim 13, further 
characterized by being expressed in a significantly 

25 higher amount in brain versus lung, liver or kidney. 

15. A DS-CAM protein according to claim 13, wherein 
the amino acid sequence of said protein comprises 
substantially the same protein sequence set forth in 
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SEQ ID NO: 2 or SEQ ID NO: 11, or the DS-CAM coding region 
of SEQ ID NO:7, SEQ ID N0:8 or SEQ ID NO : 9 . 

16. A DS-CAM protein according to claim 15 

comprising the same amino acid sequence as the protein 
5 sequence set forth in SEQ ID NO: 2 or SEQ ID NO: 11, or the 
DS-CAM coding region of SEQ ID NO: 7, SEQ ID NO: 8 or 
SEQ ID NO: 9. 

17 f a DS-CAM protein according to claim 13, wherein 

said protein is encoded by a nucleotide sequence 
10 comprising substantially the same nucleotide sequence set 
forth in SEQ ID NO:l, SEQ ID NO: 7, SEQ ID NO: 8, 
SEQ ID NO: 9, or SEQ ID NO: 10. 

18. A DS-CAM protein according to claim 17, wherein 
said protein is encoded by a nucleotide sequence 

15 comprising SEQ ID NO:l or SEQ ID NO: 10. 

19. A DS-CAM protein according to claim 13, wherein 
said protein is encoded by a nucleotide sequence that 
comprises substantially the same nucleotide sequence as 
nucleotides 453-6185 set forth in SEQ ID NO:l, 

20 nucleotides 453-5168 set forth in SEQ ID N0:10, 
SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9. 

20. Method for expression of a DS-CAM-related 
protein, said method comprising culturing cells of claim 
8 under conditions suitable for expression of said DS-CAM 

25 protein. 

21. An isolated anti-DS-CAM antibody having 
specific reactivity with a DS-CAM protein according to 
claim 13 . 

22. Antibody according to claim 21, wherein said 
30 antibody is a monoclonal antibody. 
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23. An antibody according to claim 21, wherein said 
antibody is a polyclonal antibody. 

24. A composition comprising an amount of the 
antisense oligonucleotide according to claim 11 effective 

5 to inhibit expression of a DS-CAM protein and an 

acceptable hydrophobic carrier capable of passing through 
a cell membrane. 

25. A transgenic nonhuman mammal expressing 
exogenous nucleic acid encoding a DS-CAM protein. 

10 26. A transgenic nonhuman mammal according to claim 

25, wherein said nucleic acid encoding said DS-CAM 
protein has been mutated, and wherein the DS-CAM protein 
so expressed is not native DS-CAM. 

27. A transgenic nonhuman mammal according to claim 
15- 25-, -wherein the. transgenic nonhuman mammal is a mouse. 

28. A method for identifying nucleic acids encoding 
a mammalian DS-CAM protein, said method comprising: 

contacting a sample containing nucleic acids with an 
oligonucleotide according to claim 9, wherein said 
20 contacting is effected under high stringency 

hybridization conditions, and identifying compounds which 
hybridize thereto. 

29. A method for detecting the presence of a 
mammalian DS-CAM protein in a sample, said method 

25 comprising contacting a test sample with an antibody 
according to claim 21, detecting the presence of an 
antibody-DS-CAM complex, and therefor detecting the 
presence of a mammalian DS-CAM in said test sample. 



30 



30. Single strand DNA primers for amplification of 
DS-CAM nucleic acid, wherein said primers comprise a 
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nucleic acid sequence derived from the nucleic acid 
sequence set forth as SEQ ID NO: 1, SEQ ID NO: 10, 
SEQ ID NO: 7, SEQ ID NO : 8 or SEQ ID NO: 9. 
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